We've used Hadoop MapReduce with Solr to parallelize indexing for a customer 
and that brought down their multi-hour indexing process down to a couple of 
minutes.  There is/was also Lucene-level contrib in Hadoop that makes use of 
MapReduce to parallelize indexing.

Otis

----
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/


----- Original Message -----
> From: Guru Chandar <guru.chan...@consona.com>
> To: java-user@lucene.apache.org
> Cc: 
> Sent: Thursday, June 30, 2011 5:12 AM
> Subject: distributing the indexing process
> 
> 
> 
> If we have to index a lot of documents, is there a way to divide the
> documents into multiple sets and index them on multiple machines in
> parallel, and then merge the resulting indexes back into a single
> machine? If yes, will the result be logically equivalent to indexing all
> the documents on a single machine?
> 
> 
> 
> Thanks,
> 
> -gc
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to