Re: best practice: 1.4 billions documents

Yonik Seeley Sun, 21 Nov 2010 15:48:34 -0800

On Sun, Nov 21, 2010 at 6:33 PM, Luca Rondanini
<luca.rondan...@gmail.com> wrote:
> Hi everybody,
>
> I really need some good advice! I need to index in lucene something like 1.4
> billions documents. I had experience in lucene but I've never worked with
> such a big number of documents. Also this is just the number of docs at
> "start-up": they are going to grow and fast.
>
> I don't have to tell you that I need the system to be fast and to support
> real time updates to the documents
>
> The first solution that came to my mind was to use ParallelMultiSearcher,
> splitting the index into many "sub-index" (how many docs per index?
> 100,000?) but I don't have experience with it and I don't know how well will
> scale while the number of documents grows!
>
> A more solid solution seems to build some kind of integration with hadoop.
> But I didn't find match about lucene and hadoop integration.
>
> Any idea? Which direction should I go (pure lucene or hadoop)?


There seems to be a common misconception about hadoop regarding search.
Map-reduce as implemented in hadoop is really for batch oriented jobs
only (or those types of jobs where you don't need a quick response
time).  It's definitely not for normal queries (unless you have
unusual requirements).

-Yonik
http://www.lucidimagination.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: best practice: 1.4 billions documents

Reply via email to