[ 
https://issues.apache.org/jira/browse/LUCENE-8688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16776543#comment-16776543
 ] 

Armin Braun commented on LUCENE-8688:
-------------------------------------

Gave this a shot in the attached patch:
 * Basically brought back the old logic (pre LUCENE-7976) of simply collecting 
as many of the smallest segments as possible ("possible" now including the max 
segment size check).
 ** Made the tradeoff of merging the smallest remaining segments  to get to the 
requested segment count
 ** Technically speaking one could do better than the above trade-off (in some 
cases) by using a smarter bin-packing algorithm but the above comment described 
merging large segments close to bin-size with tiny segments as wasteful so I 
didn't try that
 * Added a new rough test that checks that we arrive at the exact max segment 
count and don't exceed max segment size significantly 
 ** It's much stricter than the existing test for this size-wise

[^LUCENE-8688.patch]

> Forced merges merge more than necessary
> ---------------------------------------
>
>                 Key: LUCENE-8688
>                 URL: https://issues.apache.org/jira/browse/LUCENE-8688
>             Project: Lucene - Core
>          Issue Type: Bug
>            Reporter: Adrien Grand
>            Priority: Minor
>         Attachments: LUCENE-8688.patch
>
>
> A user reported some surprise after the upgrade to Lucene 7.5 due to changes 
> to how forced merges are selected when maxSegmentCount is greater than 1.
> Before 7.5 forceMerge used to pick up the least amount of merging that would 
> result in an index that has maxSegmentCount segments at most. Now that we 
> share the same logic as regular merges, we are almost sure to pick a 
> maxMergeAtOnceExplicit-segments merge (30 segments) given that merges that 
> have more segments usually score better. This is due to the fact that natural 
> merges assume that merges that run now save work for later, so the more 
> segments get merged, the better. This assumption doesn't hold for forced 
> merges that should run on read-only indices, so there won't be any future 
> merging.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to