This is perfect, exactly what I was looking for. Thanks much Andrzej!
On Mon, Mar 23, 2009 at 1:43 AM, Andrzej Bialecki <a...@getopt.org> wrote: > Shashi Kant wrote: > >> Is there an "elegant" approach to partitioning a large Lucene index (~1TB) >> into smaller sub-indexes other than the obvious method of re-indexing into >> partitions? >> Any ideas? >> > > Try the following: > > * open your index, and mark all documents as deleted except 1/Nth that > should fill the first shard. Close the index, BUT DO NOT OPTIMIZE IT! > > * create IndexWriter, and use addIndexes to add the original index. Only > non-deleted docs will be copied. > > * open the original index and use undeleteAll() to revert the deletions. > > * mark the next 1/Nth documents as deleted > ... > * repeat the cycle as many times as needed > > A more elegant version of this algorithm can be implemented using > FilterIndexReader. > > -- > Best regards, > Andrzej Bialecki <>< > ___. ___ ___ ___ _ _ __________________________________ > [__ || __|__/|__||\/| Information Retrieval, Semantic Web > ___|||__|| \| || | Embedded Unix, System Integration > http://www.sigram.com Contact: info at sigram dot com > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > >