[ 
https://issues.apache.org/jira/browse/LUCENE-7976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16213021#comment-16213021
 ] 

Erick Erickson commented on LUCENE-7976:
----------------------------------------

Mike:

bq: Lucene is quite good at skipping deleted docs during search....

That's not the nub of the issue for me. I'm seeing very large indexes, 200-300G 
is quite common lately on a single core. We have customers approaching 100T 
indexes in aggregate in single Solr collections. And that problem is only going 
to get worse as hardware improves and super-especially if Java's GC algorithm 
evolves to work smoothly with larger heaps. BTW, this is not theoretical, I 
have a client using Azul's Zing with Java heaps approaching 80G. It's an edge 
case to be sure, but similar will become more common.

So 50% deleted documents consumes a _lot_ of resources, both disk and RAM when 
considered in aggregate at that scale. I realize that any of the options here 
will increase I/O, but that's preferable to having to provision a new data 
center because you're physically out of space and can't add more machines or 
even attach more storage to current machines.

bq: maybe we could simply relax TMP so that even max sized segments that have < 
50% deletions are eligible for merging

Just to be sure I understand this... Are you saying that we make it possible to 
merge, say, one segment with 3.5G and 5 other segments each 0.3G? That seems 
like it'd work.

That leaves finding a way out of what happens when someone actually does have a 
huge segment as a result of force merging. I know, I know, "don't do that" and 
"get rid of the big red optimize button in the Solr admin screen and stop 
talking about it!". I suppose your suggestion can tackle that too if we define 
an edge case in your "relax TMP so that...." idea to include a "singleton 
merge" if the _result_ of the merge would be > max segment size.

Thanks for your input! Let's just say I have a lot more faith in your knowledge 
of this code than mine......

> Add a parameter to TieredMergePolicy to merge segments that have more than X 
> percent deleted documents
> ------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-7976
>                 URL: https://issues.apache.org/jira/browse/LUCENE-7976
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Erick Erickson
>
> We're seeing situations "in the wild" where there are very large indexes (on 
> disk) handled quite easily in a single Lucene index. This is particularly 
> true as features like docValues move data into MMapDirectory space. The 
> current TMP algorithm allows on the order of 50% deleted documents as per a 
> dev list conversation with Mike McCandless (and his blog here:  
> https://www.elastic.co/blog/lucenes-handling-of-deleted-documents).
> Especially in the current era of very large indexes in aggregate, (think many 
> TB) solutions like "you need to distribute your collection over more shards" 
> become very costly. Additionally, the tempting "optimize" button exacerbates 
> the issue since once you form, say, a 100G segment (by 
> optimizing/forceMerging) it is not eligible for merging until 97.5G of the 
> docs in it are deleted (current default 5G max segment size).
> The proposal here would be to add a new parameter to TMP, something like 
> <maxAllowedPctDeletedInBigSegments> (no, that's not serious name, suggestions 
> welcome) which would default to 100 (or the same behavior we have now).
> So if I set this parameter to, say, 20%, and the max segment size stays at 
> 5G, the following would happen when segments were selected for merging:
> > any segment with > 20% deleted documents would be merged or rewritten NO 
> > MATTER HOW LARGE. There are two cases,
> >> the segment has < 5G "live" docs. In that case it would be merged with 
> >> smaller segments to bring the resulting segment up to 5G. If no smaller 
> >> segments exist, it would just be rewritten
> >> The segment has > 5G "live" docs (the result of a forceMerge or optimize). 
> >> It would be rewritten into a single segment removing all deleted docs no 
> >> matter how big it is to start. The 100G example above would be rewritten 
> >> to an 80G segment for instance.
> Of course this would lead to potentially much more I/O which is why the 
> default would be the same behavior we see now. As it stands now, though, 
> there's no way to recover from an optimize/forceMerge except to re-index from 
> scratch. We routinely see 200G-300G Lucene indexes at this point "in the 
> wild" with 10s of  shards replicated 3 or more times. And that doesn't even 
> include having these over HDFS.
> Alternatives welcome! Something like the above seems minimally invasive. A 
> new merge policy is certainly an alternative.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to