[ 
https://issues.apache.org/jira/browse/LUCENE-8688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16763748#comment-16763748
 ] 

Erick Erickson commented on LUCENE-8688:
----------------------------------------

Yes, certainly introduced in LUCENE-7976.

Hmmm. Going largely from memory since I'm on vacation.. Are you saying that 
when the number of segments is specified, we're merging and re-merging the same 
data? I.e. merging 30 segments (maxMergeAtOnceExplicit) into one segment, then 
merging _that_ segment later because it's still relatively small?

Or are close-to-the-new-max segment size with no deleted docs being merged with 
small segments? Which would be pretty wasteful...

I pretty much blindly let the merge scoring algorithm do its thing without 
special handling for this case other than to compute the theoretical segment 
size and let the scoring pick segments to merge, so there's certainly room for 
refining based on write ops in this case.

I've been wondering for a while whether maxMergeAtOnceExplicit should be made 
larger (or eliminated). Would that alter the writes the user is seeing?

All that said, pulling back the code for findForcedMerges from before 
LUCENE-7976 and using it when the number of segments is specified is certainly 
an option and would be a quick fix. 

> Forced merges merge more than necessary
> ---------------------------------------
>
>                 Key: LUCENE-8688
>                 URL: https://issues.apache.org/jira/browse/LUCENE-8688
>             Project: Lucene - Core
>          Issue Type: Bug
>            Reporter: Adrien Grand
>            Priority: Minor
>
> A user reported some surprise after the upgrade to Lucene 7.5 due to changes 
> to how forced merges are selected when maxSegmentCount is greater than 1.
> Before 7.5 forceMerge used to pick up the least amount of merging that would 
> result in an index that has maxSegmentCount segments at most. Now that we 
> share the same logic as regular merges, we are almost sure to pick a 
> maxMergeAtOnceExplicit-segments merge (30 segments) given that merges that 
> have more segments usually score better. This is due to the fact that natural 
> merges assume that merges that run now save work for later, so the more 
> segments get merged, the better. This assumption doesn't hold for forced 
> merges that should run on read-only indices, so there won't be any future 
> merging.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to