Re: Merging segment parts concurrently (SegmentMerger)

Michael McCandless Tue, 26 Jan 2021 14:39:32 -0800

Oh I found this long ago (well, ~2 years) issue exploring this:
https://issues.apache.org/jira/browse/LUCENE-8580


Mike McCandless

http://blog.mikemccandless.com


On Tue, Jan 26, 2021 at 3:38 PM Dawid Weiss <dawid.we...@gmail.com> wrote:

> > +1 to make a single merge concurrent!  It is horribly frustrating to
> watch that last merge running on a single core :)  I have lost many hours
> of my life to this frustration.
>
> > Yeah... it is, isn't it? Especially on new machines where you have
> super-fast SSDs, countless cores, etc... That last merge consumes so few
> resources that the computer feels practically idle... it's hard to explain
> to people using our software (who invested in hardware) why we're basically
> doing nothing... :)
>
> > I do think we need to explore concurrency within terms/postings across
> fields in one segment to really see gains in the common case where merge
> time is dominated by postings.
>
> Yeah, probably.
>
> > if you want to experiment with something like that, you can hackishly
> simulate it today to quickly see the overhead, correct? its a small hack to
> PerFieldPostingsFormat to force it to emit "files-per-field" and then CFS
> will combine it all together.
>
> Good idea, Robert. I'll try this.
>
> > By default merging stored fields is super fast because Lucene can copy
> compressed data directly, but if there are deletes or index sorting is
> enabled this optimization is not applicable anymore and I wouldn't be
> surprised if stored fields started taking non negligible time.
>
> In this case these segments are essentially made from scratch but with
> lots and lots of term vectors and postings... But the more parallel
> stages we can introduce, the better.
>
> I have some other stuff on my plate before I can dive deep into this
> but I eventually will. Thanks for the pointers, everyone - helpful!
>
> D.
>

Re: Merging segment parts concurrently (SegmentMerger)

Reply via email to