Re: Merging segment parts concurrently (SegmentMerger)

Michael Sokolov Wed, 27 Jan 2021 03:42:27 -0800

I thought I remembered the discussion, searched for the issue in jira, but
could not find. Probably Mike used his souped up search?


On Wed, Jan 27, 2021, 3:07 AM Dawid Weiss <dawid.we...@gmail.com> wrote:

> Darn... I swear sometimes, when I try hard enough, I can hear my brain
> cells giving up to atrophy... Sigh.
>
>
> D.
>
> On Wed, Jan 27, 2021 at 4:44 AM David Smiley <dsmi...@apache.org> wrote:
> >
> > LOL and it was Dawid :-)  Having amnesia Dawid?
> > I think I've re-explored my own ideas before too.
> >
> > ~ David Smiley
> > Apache Lucene/Solr Search Developer
> > http://www.linkedin.com/in/davidwsmiley
> >
> >
> > On Tue, Jan 26, 2021 at 5:39 PM Michael McCandless <
> luc...@mikemccandless.com> wrote:
> >>
> >> Oh I found this long ago (well, ~2 years) issue exploring this:
> https://issues.apache.org/jira/browse/LUCENE-8580
> >>
> >> Mike McCandless
> >>
> >> http://blog.mikemccandless.com
> >>
> >>
> >> On Tue, Jan 26, 2021 at 3:38 PM Dawid Weiss <dawid.we...@gmail.com>
> wrote:
> >>>
> >>> > +1 to make a single merge concurrent!  It is horribly frustrating to
> watch that last merge running on a single core :)  I have lost many hours
> of my life to this frustration.
> >>>
> >>> > Yeah... it is, isn't it? Especially on new machines where you have
> super-fast SSDs, countless cores, etc... That last merge consumes so few
> resources that the computer feels practically idle... it's hard to explain
> to people using our software (who invested in hardware) why we're basically
> doing nothing... :)
> >>>
> >>> > I do think we need to explore concurrency within terms/postings
> across fields in one segment to really see gains in the common case where
> merge time is dominated by postings.
> >>>
> >>> Yeah, probably.
> >>>
> >>> > if you want to experiment with something like that, you can
> hackishly simulate it today to quickly see the overhead, correct? its a
> small hack to PerFieldPostingsFormat to force it to emit "files-per-field"
> and then CFS will combine it all together.
> >>>
> >>> Good idea, Robert. I'll try this.
> >>>
> >>> > By default merging stored fields is super fast because Lucene can
> copy compressed data directly, but if there are deletes or index sorting is
> enabled this optimization is not applicable anymore and I wouldn't be
> surprised if stored fields started taking non negligible time.
> >>>
> >>> In this case these segments are essentially made from scratch but with
> >>> lots and lots of term vectors and postings... But the more parallel
> >>> stages we can introduce, the better.
> >>>
> >>> I have some other stuff on my plate before I can dive deep into this
> >>> but I eventually will. Thanks for the pointers, everyone - helpful!
> >>>
> >>> D.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>

Re: Merging segment parts concurrently (SegmentMerger)

Reply via email to