[ 
https://issues.apache.org/jira/browse/LUCENE-7976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16211959#comment-16211959
 ] 

Erick Erickson commented on LUCENE-7976:
----------------------------------------

What Yonik said.

+1 to working up a patch. I actually think this is pretty important.

bq: Also keep a lower bound check so users can't set a delete threshold below 
20%.

Don't know. This is another arbitrary decision that may or may not apply. 
Perhaps a _strongly_ worded suggestion that this be the lower bound and a WARN 
message on startup if they specify < 20%? 20% of a 10TB (aggregate across 
shards) index is still a lot. I don't have strong feelings here though.

Hmmm. If you have a setter like setMaxDeletePctBeforeSingletonMerge(double pct) 
then through reflection you can just specify
<double name="maxDeletePctBeforeSingletonMerge>5</double>
in the merge policy and it'll automagically get picked up. Then we don't 
advertise it, making it truly expert.... 

bq:  ...pick a segment which is 5G in size and more than the threshold 
deletes...

Minor refinement. Pick a segment > 2.5G "live" documents and > X% deleted docs 
and merge it. That way we merge a 4G segment with 20% deleted into a 3.2G 
segment. Rinse and repeat until it had < 2.5G live docs at which point it's 
eligible for regular merging.

The sweet thing about this is that it would allow users to _recover_ from an 
optimize. Currently if they do hit that big red button and optimize they can't 
recover deleted documents until that single huge segment has < 2.5G live docs. 
Something like this will keep rewriting that segment into smaller and smaller 
(though still large) segments and it'll eventually disappear. Mind you it'll be 
painful, but at least it'll eventually get there.

I'm not sure whether to make this behavior the default for  TieredMergePolicy 
or not. Other than rewriting very large segments, the current policy is 
essentially this with X being 50%. Despite my comments about keeping reflection 
above, WDYT about just making this explicit? That is, default a parameter like 
"largeSegmentMaxDeletePct" to 50?

And for a final thought, WDYT about Mike's idea of making 
optimize/forcemerge/expungeDeletes respect maxSegemntSize? I think we still 
need to rewrite segments as this JIRA proposes since the current policy can 
hover around 50%. I'm lukewarm to making optimize respect max segment size 
since it would change that behavior, but I don't have strong feelings on it.

> Add a parameter to TieredMergePolicy to merge segments that have more than X 
> percent deleted documents
> ------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-7976
>                 URL: https://issues.apache.org/jira/browse/LUCENE-7976
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Erick Erickson
>
> We're seeing situations "in the wild" where there are very large indexes (on 
> disk) handled quite easily in a single Lucene index. This is particularly 
> true as features like docValues move data into MMapDirectory space. The 
> current TMP algorithm allows on the order of 50% deleted documents as per a 
> dev list conversation with Mike McCandless (and his blog here:  
> https://www.elastic.co/blog/lucenes-handling-of-deleted-documents).
> Especially in the current era of very large indexes in aggregate, (think many 
> TB) solutions like "you need to distribute your collection over more shards" 
> become very costly. Additionally, the tempting "optimize" button exacerbates 
> the issue since once you form, say, a 100G segment (by 
> optimizing/forceMerging) it is not eligible for merging until 97.5G of the 
> docs in it are deleted (current default 5G max segment size).
> The proposal here would be to add a new parameter to TMP, something like 
> <maxAllowedPctDeletedInBigSegments> (no, that's not serious name, suggestions 
> welcome) which would default to 100 (or the same behavior we have now).
> So if I set this parameter to, say, 20%, and the max segment size stays at 
> 5G, the following would happen when segments were selected for merging:
> > any segment with > 20% deleted documents would be merged or rewritten NO 
> > MATTER HOW LARGE. There are two cases,
> >> the segment has < 5G "live" docs. In that case it would be merged with 
> >> smaller segments to bring the resulting segment up to 5G. If no smaller 
> >> segments exist, it would just be rewritten
> >> The segment has > 5G "live" docs (the result of a forceMerge or optimize). 
> >> It would be rewritten into a single segment removing all deleted docs no 
> >> matter how big it is to start. The 100G example above would be rewritten 
> >> to an 80G segment for instance.
> Of course this would lead to potentially much more I/O which is why the 
> default would be the same behavior we see now. As it stands now, though, 
> there's no way to recover from an optimize/forceMerge except to re-index from 
> scratch. We routinely see 200G-300G Lucene indexes at this point "in the 
> wild" with 10s of  shards replicated 3 or more times. And that doesn't even 
> include having these over HDFS.
> Alternatives welcome! Something like the above seems minimally invasive. A 
> new merge policy is certainly an alternative.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to