[
https://issues.apache.org/jira/browse/SOLR-12941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16849025#comment-16849025
]
Andrzej Bialecki commented on SOLR-12941:
------------------------------------------
This patch contains the following changes:
* added a new metric {{SEARCHER.searcher.indexCommitSize}} that tracks the size
of only those index files that are included in the latest commit point.
* {{IndexSizeTrigger}} now uses this metric to consider only the size of the
latest commit point. Using this value it calculates an estimated effective
index size based on the percentage of non-deleted documents.
* unit test to demonstrate that {{aboveBytes}} condition now works with
{{splitMethod=link}}.
* improve tracking of SEARCHER metrics in the simulator.
If there are no objections I'd like to commit this shortly.
> IndexSizeTrigger and splitMethod=link problems
> ----------------------------------------------
>
> Key: SOLR-12941
> URL: https://issues.apache.org/jira/browse/SOLR-12941
> Project: Solr
> Issue Type: Bug
> Security Level: Public(Default Security Level. Issues are Public)
> Affects Versions: 7.6, 8.0
> Reporter: Andrzej Bialecki
> Assignee: Andrzej Bialecki
> Priority: Major
> Attachments: SOLR-12941.patch
>
>
> {{IndexSizeTrigger}} can be configured to use {{splitMethod=link}}
> (SOLR-12730), which uses hard-linking for creating sub-shards.
> However, if the trigger uses {{aboveBytes}} condition the resulting
> sub-shards will not immediately decrease in size, until all of the deleted
> documents will be expunged (either by gradual merges or by explicit and
> costly expungeDeletes command). As a result the new sub-shards will still
> exceed the {{aboveBytes}} threshold, which will cause the trigger to keep
> generating new split requests.
> I see two options how to solve this:
> * disallow using {{aboveBytes}} with {{splitMethod=link}}. This
> unfortunately is a very desirable combination because it monitors the actual
> index size and uses the fast splitting method.
> * calculate an internal estimate of "eventual index size" for an index with
> deletions, and use this estimate when checking with {{aboveBytes}} instead of
> the real index size. This of course introduces a potentially significant
> estimation error but allows to properly treat hard-linked sub-shards with
> deletions as (eventually) significantly smaller than the parent shard.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]