On Sun, Nov 21, 2010 at 6:33 PM, Luca Rondanini <luca.rondan...@gmail.com> wrote: > Hi everybody, > > I really need some good advice! I need to index in lucene something like 1.4 > billions documents. I had experience in lucene but I've never worked with > such a big number of documents. Also this is just the number of docs at > "start-up": they are going to grow and fast. > > I don't have to tell you that I need the system to be fast and to support > real time updates to the documents > > The first solution that came to my mind was to use ParallelMultiSearcher, > splitting the index into many "sub-index" (how many docs per index? > 100,000?) but I don't have experience with it and I don't know how well will > scale while the number of documents grows! > > A more solid solution seems to build some kind of integration with hadoop. > But I didn't find match about lucene and hadoop integration. > > Any idea? Which direction should I go (pure lucene or hadoop)?
There seems to be a common misconception about hadoop regarding search. Map-reduce as implemented in hadoop is really for batch oriented jobs only (or those types of jobs where you don't need a quick response time). It's definitely not for normal queries (unless you have unusual requirements). -Yonik http://www.lucidimagination.com --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org