Hi Luca, Katta is an open-source project that integrates Lucene with Hadoop http://katta.sourceforge.net
Johannes 2010/11/21 Luca Rondanini <luca.rondan...@gmail.com> > Hi everybody, > > I really need some good advice! I need to index in lucene something like > 1.4 > billions documents. I had experience in lucene but I've never worked with > such a big number of documents. Also this is just the number of docs at > "start-up": they are going to grow and fast. > > I don't have to tell you that I need the system to be fast and to support > real time updates to the documents > > The first solution that came to my mind was to use ParallelMultiSearcher, > splitting the index into many "sub-index" (how many docs per index? > 100,000?) but I don't have experience with it and I don't know how well > will > scale while the number of documents grows! > > A more solid solution seems to build some kind of integration with hadoop. > But I didn't find match about lucene and hadoop integration. > > Any idea? Which direction should I go (pure lucene or hadoop)? > > Thanks > Luca > -- Johannes Goll 211 Curry Ford Lane Gaithersburg, Maryland 20878