thank you both! Johannes, katta seems interesting but I will need to solve the problems of "hot" updates to the index
Yonik, I see your point - so your suggestion would be to build an architecture based on ParallelMultiSearcher? On Sun, Nov 21, 2010 at 3:48 PM, Yonik Seeley <yo...@lucidimagination.com>wrote: > On Sun, Nov 21, 2010 at 6:33 PM, Luca Rondanini > <luca.rondan...@gmail.com> wrote: > > Hi everybody, > > > > I really need some good advice! I need to index in lucene something like > 1.4 > > billions documents. I had experience in lucene but I've never worked with > > such a big number of documents. Also this is just the number of docs at > > "start-up": they are going to grow and fast. > > > > I don't have to tell you that I need the system to be fast and to support > > real time updates to the documents > > > > The first solution that came to my mind was to use ParallelMultiSearcher, > > splitting the index into many "sub-index" (how many docs per index? > > 100,000?) but I don't have experience with it and I don't know how well > will > > scale while the number of documents grows! > > > > A more solid solution seems to build some kind of integration with > hadoop. > > But I didn't find match about lucene and hadoop integration. > > > > Any idea? Which direction should I go (pure lucene or hadoop)? > > There seems to be a common misconception about hadoop regarding search. > Map-reduce as implemented in hadoop is really for batch oriented jobs > only (or those types of jobs where you don't need a quick response > time). It's definitely not for normal queries (unless you have > unusual requirements). > > -Yonik > http://www.lucidimagination.com > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > >