Erick Erickson commented on LUCENE-8263:

[~jpountz] Reproducing TestTieredMergePolicy failure, all that's necessary is 
the seed:


This is _not_ a result of the code changes for this JIRA, I happened to notice 
it from a Jenkins build and chased it down. Turns out to be a rounding error 
that's been there forever. The max segment bytes in TieredMergePolicy is 1585. 
The test tries to calculate 125% like this:

final long max125Pct = (long) ((tmp.getMaxMergedSegmentMB() * 1024.0 * 1024.0) 
* 1.25);
which gives a value of  1280, should be closer to 1981, which would pass the 

It all works if we change TMP.getMaxMergedSegmentMB()
    return maxMergedSegmentBytes/1024/1024.;
    return maxMergedSegmentBytes/1024./1024.;
(note additional decimal point in first 1024)

This is something of a test artifact since having such tiny limits on the 
segment size is extremely artificial.

Do you want to add that to the patch? Separate JIRA?

> Add indexPctDeletedTarget as a parameter to TieredMergePolicy to control more 
> aggressive merging
> ------------------------------------------------------------------------------------------------
>                 Key: LUCENE-8263
>                 URL: https://issues.apache.org/jira/browse/LUCENE-8263
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Erick Erickson
>            Assignee: Erick Erickson
>            Priority: Major
>         Attachments: LUCENE-8263.patch
> Spinoff of LUCENE-7976 to keep the two issues separate.
> The current TMP allows up to 50% deleted docs, which can be wasteful on large 
> indexes. This parameter will do more aggressive merging of segments with 
> deleted documents when the _total_ percentage of deleted docs in the entire 
> index exceeds it.
> Setting this to 50% should approximate current behavior. Setting it to 20% 
> caused the first cut at this to increase I/O roughly 10%. Setting it to 10% 
> caused about a 50% increase in I/O.
> I was conflating the two issues, so I'll change 7976 and comment out the bits 
> that reference this new parameter. After it's checked in we can bring this 
> back. That should be less work than reconstructing this later.
> Among the questions to be answered:
> 1> what should the default be? I propose 20% as it results in significantly 
> less space wasted and helps control heap usage for a modest increase in I/O.
> 2> what should the floor be? I propose 10% with _strong_ documentation 
> warnings about not setting it below 20%.
> 3> should there be two parameters? I think this was discussed somewhat in 
> 7976. The first cut at  this used this number for two purposes:
> 3a> the total percentage of deleted docs index-wide to trip this trigger
> 3b> the percentage of an _individual_ segment that had to be deleted if the 
> segment was over maxSegmentSize/2 bytes in order to be eligible for merging. 
> Empirically, using the same percentage for both caused the merging to hover 
> around the value specified for this parameter.
> My proposal for <3> would be to have the parameter do double-duty. Assuming 
> my preliminary results hold, you specify this parameter at, say, 20% and once 
> the index hits that % deleted docs it hovers right around there, even if 
> you've forceMerged earlier down to 1 segment. This seems in line with what 
> I'd expect and adding another parameter seems excessively complicated to no 
> good purpose. We could always add something like that later if we wanted.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to