[
https://issues.apache.org/jira/browse/LUCENE-8688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16763748#comment-16763748
]
Erick Erickson commented on LUCENE-8688:
----------------------------------------
Yes, certainly introduced in LUCENE-7976.
Hmmm. Going largely from memory since I'm on vacation.. Are you saying that
when the number of segments is specified, we're merging and re-merging the same
data? I.e. merging 30 segments (maxMergeAtOnceExplicit) into one segment, then
merging _that_ segment later because it's still relatively small?
Or are close-to-the-new-max segment size with no deleted docs being merged with
small segments? Which would be pretty wasteful...
I pretty much blindly let the merge scoring algorithm do its thing without
special handling for this case other than to compute the theoretical segment
size and let the scoring pick segments to merge, so there's certainly room for
refining based on write ops in this case.
I've been wondering for a while whether maxMergeAtOnceExplicit should be made
larger (or eliminated). Would that alter the writes the user is seeing?
All that said, pulling back the code for findForcedMerges from before
LUCENE-7976 and using it when the number of segments is specified is certainly
an option and would be a quick fix.
> Forced merges merge more than necessary
> ---------------------------------------
>
> Key: LUCENE-8688
> URL: https://issues.apache.org/jira/browse/LUCENE-8688
> Project: Lucene - Core
> Issue Type: Bug
> Reporter: Adrien Grand
> Priority: Minor
>
> A user reported some surprise after the upgrade to Lucene 7.5 due to changes
> to how forced merges are selected when maxSegmentCount is greater than 1.
> Before 7.5 forceMerge used to pick up the least amount of merging that would
> result in an index that has maxSegmentCount segments at most. Now that we
> share the same logic as regular merges, we are almost sure to pick a
> maxMergeAtOnceExplicit-segments merge (30 segments) given that merges that
> have more segments usually score better. This is due to the fact that natural
> merges assume that merges that run now save work for later, so the more
> segments get merged, the better. This assumption doesn't hold for forced
> merges that should run on read-only indices, so there won't be any future
> merging.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]