[ 
https://issues.apache.org/jira/browse/LUCENE-7976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16192266#comment-16192266
 ] 

Erick Erickson commented on LUCENE-7976:
----------------------------------------

There are two issues here that are a bit conflated; the consequences of 
forceMerge and having up to 50% of your index space used up by deleted docs:

1> If they _do_ optimize/forcemerge/expungeDeletes, they're stuck. Totally 
agree that having a big red button makes that way too tempting. Even if 
removed, users can still use the optimize call from the SolrJ client and/or via 
the update handler. So one issue is if there are ways to prevent the 
unfortunate consequences (the freeze idea, only optimize into segments max 
segment size etc) or recover somehow (some of the proposals above).  Keeping 
the number of deleted docs lower would make pressing that button less tempting, 
but the button still should be removed. There are ways to forceMerge even if 
removed though.

2> Even if they _don't_ forcemerge/expungeDeletes, having 50% of the index 
consumed by deleted docs can be quite costly. Telling users that they have only 
two choices, 1> start and keep optimizing or 2> buy enough hardware that they 
can meet their SLAs with half their index space wasted is a hard sell. We have 
people who need 100s of machines in their clusters to hit their SLAs. Accepting 
up to 50% deleted docs as the norm means potentially millions of dollars in 
unnecessary hardware.



> Add a parameter to TieredMergePolicy to merge segments that have more than X 
> percent deleted documents
> ------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-7976
>                 URL: https://issues.apache.org/jira/browse/LUCENE-7976
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Erick Erickson
>
> We're seeing situations "in the wild" where there are very large indexes (on 
> disk) handled quite easily in a single Lucene index. This is particularly 
> true as features like docValues move data into MMapDirectory space. The 
> current TMP algorithm allows on the order of 50% deleted documents as per a 
> dev list conversation with Mike McCandless (and his blog here:  
> https://www.elastic.co/blog/lucenes-handling-of-deleted-documents).
> Especially in the current era of very large indexes in aggregate, (think many 
> TB) solutions like "you need to distribute your collection over more shards" 
> become very costly. Additionally, the tempting "optimize" button exacerbates 
> the issue since once you form, say, a 100G segment (by 
> optimizing/forceMerging) it is not eligible for merging until 97.5G of the 
> docs in it are deleted (current default 5G max segment size).
> The proposal here would be to add a new parameter to TMP, something like 
> <maxAllowedPctDeletedInBigSegments> (no, that's not serious name, suggestions 
> welcome) which would default to 100 (or the same behavior we have now).
> So if I set this parameter to, say, 20%, and the max segment size stays at 
> 5G, the following would happen when segments were selected for merging:
> > any segment with > 20% deleted documents would be merged or rewritten NO 
> > MATTER HOW LARGE. There are two cases,
> >> the segment has < 5G "live" docs. In that case it would be merged with 
> >> smaller segments to bring the resulting segment up to 5G. If no smaller 
> >> segments exist, it would just be rewritten
> >> The segment has > 5G "live" docs (the result of a forceMerge or optimize). 
> >> It would be rewritten into a single segment removing all deleted docs no 
> >> matter how big it is to start. The 100G example above would be rewritten 
> >> to an 80G segment for instance.
> Of course this would lead to potentially much more I/O which is why the 
> default would be the same behavior we see now. As it stands now, though, 
> there's no way to recover from an optimize/forceMerge except to re-index from 
> scratch. We routinely see 200G-300G Lucene indexes at this point "in the 
> wild" with 10s of  shards replicated 3 or more times. And that doesn't even 
> include having these over HDFS.
> Alternatives welcome! Something like the above seems minimally invasive. A 
> new merge policy is certainly an alternative.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to