[ https://issues.apache.org/jira/browse/LUCENE-8263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16543465#comment-16543465 ]
Erick Erickson commented on LUCENE-8263: ---------------------------------------- +1 to the patch. There were some weird edge cases in TestTieredMergePolicy that only came to light when I beasted it, so I'll run a few thousand iterations over the weekend and report back if any pop out. Your simulations numbers square pretty well with mine when I was doing this one at the same time as 7976. I originally advocated _not_ putting a floor on the percentage and providing users with one more way to shoot themselves in the foot. I've changed my mind on that, I think 20% is fine. Now that they can forceMerge or expungeDeletes without creating massive segments, I don't think there's any good (or even bad) reason to allow < 20%. Thanks again for working on this and your help with 7976. Much appreciated. > Add indexPctDeletedTarget as a parameter to TieredMergePolicy to control more > aggressive merging > ------------------------------------------------------------------------------------------------ > > Key: LUCENE-8263 > URL: https://issues.apache.org/jira/browse/LUCENE-8263 > Project: Lucene - Core > Issue Type: Improvement > Reporter: Erick Erickson > Assignee: Erick Erickson > Priority: Major > Attachments: LUCENE-8263.patch > > > Spinoff of LUCENE-7976 to keep the two issues separate. > The current TMP allows up to 50% deleted docs, which can be wasteful on large > indexes. This parameter will do more aggressive merging of segments with > deleted documents when the _total_ percentage of deleted docs in the entire > index exceeds it. > Setting this to 50% should approximate current behavior. Setting it to 20% > caused the first cut at this to increase I/O roughly 10%. Setting it to 10% > caused about a 50% increase in I/O. > I was conflating the two issues, so I'll change 7976 and comment out the bits > that reference this new parameter. After it's checked in we can bring this > back. That should be less work than reconstructing this later. > Among the questions to be answered: > 1> what should the default be? I propose 20% as it results in significantly > less space wasted and helps control heap usage for a modest increase in I/O. > 2> what should the floor be? I propose 10% with _strong_ documentation > warnings about not setting it below 20%. > 3> should there be two parameters? I think this was discussed somewhat in > 7976. The first cut at this used this number for two purposes: > 3a> the total percentage of deleted docs index-wide to trip this trigger > 3b> the percentage of an _individual_ segment that had to be deleted if the > segment was over maxSegmentSize/2 bytes in order to be eligible for merging. > Empirically, using the same percentage for both caused the merging to hover > around the value specified for this parameter. > My proposal for <3> would be to have the parameter do double-duty. Assuming > my preliminary results hold, you specify this parameter at, say, 20% and once > the index hits that % deleted docs it hovers right around there, even if > you've forceMerged earlier down to 1 segment. This seems in line with what > I'd expect and adding another parameter seems excessively complicated to no > good purpose. We could always add something like that later if we wanted. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org