Hi Luca,

Katta is an open-source project that integrates Lucene with Hadoop
http://katta.sourceforge.net

Johannes

2010/11/21 Luca Rondanini <luca.rondan...@gmail.com>

> Hi everybody,
>
> I really need some good advice! I need to index in lucene something like
> 1.4
> billions documents. I had experience in lucene but I've never worked with
> such a big number of documents. Also this is just the number of docs at
> "start-up": they are going to grow and fast.
>
> I don't have to tell you that I need the system to be fast and to support
> real time updates to the documents
>
> The first solution that came to my mind was to use ParallelMultiSearcher,
> splitting the index into many "sub-index" (how many docs per index?
> 100,000?) but I don't have experience with it and I don't know how well
> will
> scale while the number of documents grows!
>
> A more solid solution seems to build some kind of integration with hadoop.
> But I didn't find match about lucene and hadoop integration.
>
> Any idea? Which direction should I go (pure lucene or hadoop)?
>
> Thanks
> Luca
>



-- 
Johannes Goll
211 Curry Ford Lane
Gaithersburg, Maryland 20878

Reply via email to