Ok, so it sounds like if you want a very specific number of segments you have to do a forceMerge at some point?
Is there some simple summary on how segments are formed in the first place? Something like, "one segment is created every time you flush from an IndexWriter"? Based on some experimenting and reading the code, it seems to be quite complicated, especially once you start calling addDocument from several threads in parallel. It's good to learn about the MultiReader. I'll look into that some more. Thanks, Alex On Mon, Jul 5, 2021 at 9:14 AM Uwe Schindler <u...@thetaphi.de> wrote: > If you want an exact number of segments, create 64 indexes, each > forceMerged to one segment. > After that use MultiReader to create a view on all separate indexes. > MultiReaders's contents are always flattened to a list of those 64 indexes. > > But keep in mind that this should only ever be done with *static* indexes. > As soon as you have updates, this is a bad idea (forceMerge in general) and > also splitting indexes like this. Parallelization should normally come from > multiple queries running in parallel, but you shouldn't force Lucene to run > a single query over so many indexes. > > Uwe > > ----- > Uwe Schindler > Achterdiek 19, D-28357 Bremen > https://www.thetaphi.de > eMail: u...@thetaphi.de > > > -----Original Message----- > > From: Alex K <aklib...@gmail.com> > > Sent: Monday, July 5, 2021 4:04 AM > > To: java-user@lucene.apache.org > > Subject: Control the number of segments without using forceMerge. > > > > Hi all, > > > > I'm trying to figure out if there is a way to control the number of > > segments in an index without explicitly calling forceMerge. > > > > My use-case looks like this: I need to index a static dataset of ~1 > > billion documents. I know the exact number of docs before indexing > starts. > > I know the VM where this index is searched has 64 threads. I'd like to > end > > up with exactly 64 segments, so I can search them in a parallelized > fashion. > > > > I know that I could call forceMerge(64), but this takes an extremely long > > time. > > > > Is there a straightforward way to ensure that I end up with 64 threads > > without force-merging after adding all of the documents? > > > > Thanks in advance for any tips > > > > Alex Klibisz > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > >