I suppose we should add a CallerRunsMergeScheduler (a new superclass of SerialMergeScheduler)? Or make this aspect of SMS configurable. We might use a semaphore to control how many callers can merge at once (1 == SMS of today, larger for expanded). It might be debatable if it is then "serial" or not.
I do think it'd be possible to merge parts of a segment at once! That'd be a cool feature to add. ~ David Smiley Apache Lucene/Solr Search Developer http://www.linkedin.com/in/davidwsmiley On Mon, Jan 25, 2021 at 11:05 AM Michael Sokolov <msoko...@gmail.com> wrote: > It makes sense to me. I don't have the full picture, but I did just > implement merging for vector format, and that at least, could be done > fully concurrent with other formats. I expect the same is true of > DocValues, Terms, etc. I'm not sure about the different kinds of > DocValues - they might want to be done together? > > On Mon, Jan 25, 2021 at 5:45 AM Dawid Weiss <dawid.we...@gmail.com> wrote: > > > > > > Hey everyone, > > > > I'm trying to cut the total wall-time of indexing for some fairly large > document collections on machines with a high CPU count (> 32 indexing > threads). So far my observations are: > > > > 1) I resigned from using the concurrent merge scheduler in favor of > "same thread" merging. This means the indexing thread that encounters a > merge just does it. The CMS is designed to favor concurrent searches over > indexing and it really didn't do anything I needed - in fact, I had to > disable most things it offers. I/O throttling and thread stalling are not > really practical on fast I/O in the absence of concurrent searches - you > can literally just use as many merge threads as needed to saturate the I/O. > > > > 2) It is quite frequent that everything is churning nicely until the > last few merges combine huge smaller segments and form a "long-tail" where > most cores are just idle... Here comes my question - can we execute the > individual "parts" involved in segment merging (the logic inside > SegmentMerger) in separate threads? On the surface it looks like these > steps can be done independently (even if they're executed sequentially at > the moment) but perhaps I'm missing something? > > > > I'd like to ask before I try to tinker with it. Thanks for any feedback. > > > > Dawid > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org > >