Oh I found this long ago (well, ~2 years) issue exploring this: https://issues.apache.org/jira/browse/LUCENE-8580
Mike McCandless http://blog.mikemccandless.com On Tue, Jan 26, 2021 at 3:38 PM Dawid Weiss <dawid.we...@gmail.com> wrote: > > +1 to make a single merge concurrent! It is horribly frustrating to > watch that last merge running on a single core :) I have lost many hours > of my life to this frustration. > > > Yeah... it is, isn't it? Especially on new machines where you have > super-fast SSDs, countless cores, etc... That last merge consumes so few > resources that the computer feels practically idle... it's hard to explain > to people using our software (who invested in hardware) why we're basically > doing nothing... :) > > > I do think we need to explore concurrency within terms/postings across > fields in one segment to really see gains in the common case where merge > time is dominated by postings. > > Yeah, probably. > > > if you want to experiment with something like that, you can hackishly > simulate it today to quickly see the overhead, correct? its a small hack to > PerFieldPostingsFormat to force it to emit "files-per-field" and then CFS > will combine it all together. > > Good idea, Robert. I'll try this. > > > By default merging stored fields is super fast because Lucene can copy > compressed data directly, but if there are deletes or index sorting is > enabled this optimization is not applicable anymore and I wouldn't be > surprised if stored fields started taking non negligible time. > > In this case these segments are essentially made from scratch but with > lots and lots of term vectors and postings... But the more parallel > stages we can introduce, the better. > > I have some other stuff on my plate before I can dive deep into this > but I eventually will. Thanks for the pointers, everyone - helpful! > > D. >