Re: Merging segment parts concurrently (SegmentMerger)

Dawid Weiss Wed, 27 Jan 2021 00:07:31 -0800

Darn... I swear sometimes, when I try hard enough, I can hear my brain
cells giving up to atrophy... Sigh.



D.

On Wed, Jan 27, 2021 at 4:44 AM David Smiley <dsmi...@apache.org> wrote:
>
> LOL and it was Dawid :-)  Having amnesia Dawid?
> I think I've re-explored my own ideas before too.
>
> ~ David Smiley
> Apache Lucene/Solr Search Developer
> http://www.linkedin.com/in/davidwsmiley
>
>
> On Tue, Jan 26, 2021 at 5:39 PM Michael McCandless 
> <luc...@mikemccandless.com> wrote:
>>
>> Oh I found this long ago (well, ~2 years) issue exploring this: 
>> https://issues.apache.org/jira/browse/LUCENE-8580
>>
>> Mike McCandless
>>
>> http://blog.mikemccandless.com
>>
>>
>> On Tue, Jan 26, 2021 at 3:38 PM Dawid Weiss <dawid.we...@gmail.com> wrote:
>>>
>>> > +1 to make a single merge concurrent!  It is horribly frustrating to 
>>> > watch that last merge running on a single core :)  I have lost many hours 
>>> > of my life to this frustration.
>>>
>>> > Yeah... it is, isn't it? Especially on new machines where you have 
>>> > super-fast SSDs, countless cores, etc... That last merge consumes so few 
>>> > resources that the computer feels practically idle... it's hard to 
>>> > explain to people using our software (who invested in hardware) why we're 
>>> > basically doing nothing... :)
>>>
>>> > I do think we need to explore concurrency within terms/postings across 
>>> > fields in one segment to really see gains in the common case where merge 
>>> > time is dominated by postings.
>>>
>>> Yeah, probably.
>>>
>>> > if you want to experiment with something like that, you can hackishly 
>>> > simulate it today to quickly see the overhead, correct? its a small hack 
>>> > to PerFieldPostingsFormat to force it to emit "files-per-field" and then 
>>> > CFS will combine it all together.
>>>
>>> Good idea, Robert. I'll try this.
>>>
>>> > By default merging stored fields is super fast because Lucene can copy 
>>> > compressed data directly, but if there are deletes or index sorting is 
>>> > enabled this optimization is not applicable anymore and I wouldn't be 
>>> > surprised if stored fields started taking non negligible time.
>>>
>>> In this case these segments are essentially made from scratch but with
>>> lots and lots of term vectors and postings... But the more parallel
>>> stages we can introduce, the better.
>>>
>>> I have some other stuff on my plate before I can dive deep into this
>>> but I eventually will. Thanks for the pointers, everyone - helpful!
>>>
>>> D.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Merging segment parts concurrently (SegmentMerger)

Reply via email to