Re: Merging segment parts concurrently (SegmentMerger)

Michael McCandless Wed, 27 Jan 2021 04:21:24 -0800

LOL Mike did use http://jirasearch.mikemccandless.com, our dog food Lucene
search application demonstrating many of Lucene's features (
http://blog.mikemccandless.com/2016/10/jiraseseach-20-dog-food-using-lucene-to.html),
but it was NOT easy to find!


I think I had one lonely brain cell still insisting we had indeed talked
about this somewhat recently :)

Mike McCandless

http://blog.mikemccandless.com


On Wed, Jan 27, 2021 at 6:43 AM Michael Sokolov <msoko...@gmail.com> wrote:

> I thought I remembered the discussion, searched for the issue in jira, but
> could not find. Probably Mike used his souped up search?
>
> On Wed, Jan 27, 2021, 3:07 AM Dawid Weiss <dawid.we...@gmail.com> wrote:
>
>> Darn... I swear sometimes, when I try hard enough, I can hear my brain
>> cells giving up to atrophy... Sigh.
>>
>>
>> D.
>>
>> On Wed, Jan 27, 2021 at 4:44 AM David Smiley <dsmi...@apache.org> wrote:
>> >
>> > LOL and it was Dawid :-)  Having amnesia Dawid?
>> > I think I've re-explored my own ideas before too.
>> >
>> > ~ David Smiley
>> > Apache Lucene/Solr Search Developer
>> > http://www.linkedin.com/in/davidwsmiley
>> >
>> >
>> > On Tue, Jan 26, 2021 at 5:39 PM Michael McCandless <
>> luc...@mikemccandless.com> wrote:
>> >>
>> >> Oh I found this long ago (well, ~2 years) issue exploring this:
>> https://issues.apache.org/jira/browse/LUCENE-8580
>> >>
>> >> Mike McCandless
>> >>
>> >> http://blog.mikemccandless.com
>> >>
>> >>
>> >> On Tue, Jan 26, 2021 at 3:38 PM Dawid Weiss <dawid.we...@gmail.com>
>> wrote:
>> >>>
>> >>> > +1 to make a single merge concurrent!  It is horribly frustrating
>> to watch that last merge running on a single core :)  I have lost many
>> hours of my life to this frustration.
>> >>>
>> >>> > Yeah... it is, isn't it? Especially on new machines where you have
>> super-fast SSDs, countless cores, etc... That last merge consumes so few
>> resources that the computer feels practically idle... it's hard to explain
>> to people using our software (who invested in hardware) why we're basically
>> doing nothing... :)
>> >>>
>> >>> > I do think we need to explore concurrency within terms/postings
>> across fields in one segment to really see gains in the common case where
>> merge time is dominated by postings.
>> >>>
>> >>> Yeah, probably.
>> >>>
>> >>> > if you want to experiment with something like that, you can
>> hackishly simulate it today to quickly see the overhead, correct? its a
>> small hack to PerFieldPostingsFormat to force it to emit "files-per-field"
>> and then CFS will combine it all together.
>> >>>
>> >>> Good idea, Robert. I'll try this.
>> >>>
>> >>> > By default merging stored fields is super fast because Lucene can
>> copy compressed data directly, but if there are deletes or index sorting is
>> enabled this optimization is not applicable anymore and I wouldn't be
>> surprised if stored fields started taking non negligible time.
>> >>>
>> >>> In this case these segments are essentially made from scratch but with
>> >>> lots and lots of term vectors and postings... But the more parallel
>> >>> stages we can introduce, the better.
>> >>>
>> >>> I have some other stuff on my plate before I can dive deep into this
>> >>> but I eventually will. Thanks for the pointers, everyone - helpful!
>> >>>
>> >>> D.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>
>>

Re: Merging segment parts concurrently (SegmentMerger)

Reply via email to