Hi Jim. Sure. Is the reason that you are doing this on Lustre then the fact you already had a large clustered filesystem to work from or is your lustre cluster dedicated to your image project. I have been investigating Lustre for a smallish scale filesystem as storage pool for virtual machines for scalable storage of 10TB+. Your use of lustre is interesting to me since I use Lucene also and a good part amount of the data of the virtual machine disk images I will be storing is index data that I will doing parallel searches across.
Regards, David On 11-Aug-09, at 2:40 PM, Jim McCusker wrote: > On Tue, Aug 11, 2009 at 1:14 PM, David Pratt<[email protected]> > wrote: >> Hi Jim. That is pretty cool. See there are more than 300,000 >> records at >> present. Curious about how this will work when you get into much >> larger >> scale with RAM requirement to perform search since this goes up >> substantially with lucene as number of docs goes up. I have have >> tended to >> look at sharding and parallel multisearch as means of horizontally >> scaling >> Lucene by breaking into chunks. This approach is interesting and just >> interested how you anticipate scale and performance with document >> growth. > > We haven't had significant RAM requirements with the numbers of > documents we have at the moment. Nutch is a more complete solution for > search that has support for parallel search, and I imagine that there > are other good ways of doing parallel search. Back when JXTA was still > something I used it to create parallel distributed search across > people's desktops with pretty good results. Combining the search > results can end up taking some work, though. > > Jim > -- > Jim McCusker > Programmer Analyst > Krauthammer Lab, Pathology Informatics > Yale School of Medicine > [email protected] | (203) 785-6330 > http://krauthammerlab.med.yale.edu _______________________________________________ Lustre-discuss mailing list [email protected] http://lists.lustre.org/mailman/listinfo/lustre-discuss
