I suppose we should add a CallerRunsMergeScheduler (a new superclass of
SerialMergeScheduler)?  Or make this aspect of SMS configurable.  We might
use a semaphore to control how many callers can merge at once (1 == SMS of
today, larger for expanded).  It might be debatable if it is then "serial"
or not.

I do think it'd be possible to merge parts of a segment at once!  That'd be
a cool feature to add.

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley


On Mon, Jan 25, 2021 at 11:05 AM Michael Sokolov <msoko...@gmail.com> wrote:

> It makes sense to me. I don't have the full picture, but I did just
> implement merging for vector format, and that at least, could be done
> fully concurrent with other formats. I expect the same is true of
> DocValues, Terms, etc. I'm not sure about the different kinds of
> DocValues - they might want to be done together?
>
> On Mon, Jan 25, 2021 at 5:45 AM Dawid Weiss <dawid.we...@gmail.com> wrote:
> >
> >
> > Hey everyone,
> >
> > I'm trying to cut the total wall-time of indexing for some fairly large
> document collections on machines with a high CPU count (> 32 indexing
> threads). So far my observations are:
> >
> > 1) I resigned from using the concurrent merge scheduler in favor of
> "same thread" merging. This means the indexing thread that encounters a
> merge just does it. The CMS is designed to favor concurrent searches over
> indexing and it really didn't do anything I needed - in fact, I had to
> disable most things it offers. I/O throttling and thread stalling are not
> really practical on fast I/O in the absence of concurrent searches - you
> can literally just use as many merge threads as needed to saturate the I/O.
> >
> > 2) It is quite frequent that everything is churning nicely until the
> last few merges combine huge smaller segments and form a "long-tail" where
> most cores are just idle... Here comes my question - can we execute the
> individual "parts" involved in segment merging (the logic inside
> SegmentMerger) in separate threads? On the surface it looks like these
> steps can be done independently (even if they're executed sequentially at
> the moment) but perhaps I'm missing something?
> >
> > I'd like to ask before I try to tinker with it. Thanks for any feedback.
> >
> > Dawid
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>

Reply via email to