Re: distributing the indexing process

Otis Gospodnetic Wed, 06 Jul 2011 20:50:32 -0700

We've used Hadoop MapReduce with Solr to parallelize indexing for a customer 
and that brought down their multi-hour indexing process down to a couple of 
minutes.  There is/was also Lucene-level contrib in Hadoop that makes use of 
MapReduce to parallelize indexing.


Otis

----
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/


----- Original Message -----
> From: Guru Chandar <[email protected]>
> To: [email protected]
> Cc: 
> Sent: Thursday, June 30, 2011 5:12 AM
> Subject: distributing the indexing process
> 
> 
> 
> If we have to index a lot of documents, is there a way to divide the
> documents into multiple sets and index them on multiple machines in
> parallel, and then merge the resulting indexes back into a single
> machine? If yes, will the result be logically equivalent to indexing all
> the documents on a single machine?
> 
> 
> 
> Thanks,
> 
> -gc
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: distributing the indexing process

Reply via email to