We've used Hadoop MapReduce with Solr to parallelize indexing for a customer and that brought down their multi-hour indexing process down to a couple of minutes. There is/was also Lucene-level contrib in Hadoop that makes use of MapReduce to parallelize indexing.
Otis ---- Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ ----- Original Message ----- > From: Guru Chandar <guru.chan...@consona.com> > To: java-user@lucene.apache.org > Cc: > Sent: Thursday, June 30, 2011 5:12 AM > Subject: distributing the indexing process > > > > If we have to index a lot of documents, is there a way to divide the > documents into multiple sets and index them on multiple machines in > parallel, and then merge the resulting indexes back into a single > machine? If yes, will the result be logically equivalent to indexing all > the documents on a single machine? > > > > Thanks, > > -gc > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org