If you want an exact number of segments, create 64 indexes, each forceMerged to 
one segment.
After that use MultiReader to create a view on all separate indexes. 
MultiReaders's contents are always flattened to a list of those 64 indexes.

But keep in mind that this should only ever be done with *static* indexes. As 
soon as you have updates, this is a bad idea (forceMerge in general) and also 
splitting indexes like this. Parallelization should normally come from multiple 
queries running in parallel, but you shouldn't force Lucene to run a single 
query over so many indexes.

Uwe

-----
Uwe Schindler
Achterdiek 19, D-28357 Bremen
https://www.thetaphi.de
eMail: u...@thetaphi.de

> -----Original Message-----
> From: Alex K <aklib...@gmail.com>
> Sent: Monday, July 5, 2021 4:04 AM
> To: java-user@lucene.apache.org
> Subject: Control the number of segments without using forceMerge.
> 
> Hi all,
> 
> I'm trying to figure out if there is a way to control the number of
> segments in an index without explicitly calling forceMerge.
> 
> My use-case looks like this: I need to index a static dataset of ~1
> billion documents. I know the exact number of docs before indexing starts.
> I know the VM where this index is searched has 64 threads. I'd like to end
> up with exactly 64 segments, so I can search them in a parallelized fashion.
> 
> I know that I could call forceMerge(64), but this takes an extremely long
> time.
> 
> Is there a straightforward way to ensure that I end up with 64 threads
> without force-merging after adding all of the documents?
> 
> Thanks in advance for any tips
> 
> Alex Klibisz


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to