[
https://issues.apache.org/jira/browse/SOLR-7820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15311244#comment-15311244
]
Lanny Ripple commented on SOLR-7820:
------------------------------------
Experiencing this right now since as a startup pinching penny's isn't optional.
We're about 70% allocated on disk with 60 or so shards over a dozen or two
collections. If any couple of replicas throw a hissy it's not a big deal for
Solr to recover. If a node goes down, or in one case the AWS instance starts
being flaky, then we fill disk and get to spend a lot of time baby sitting the
recovery.
If Solr sequencing recovery to avoid blowing disk isn't a good idea then please
at least expose tooling to make it easier for a human to do the same thing.
Even a way to start Solr without immediately trying to sync would be a win.
When Solr goes all-in to recover then the collections API times out on
DELETEREPLICA.
> IndexFetcher should calculate ahead of time how much space is needed for full
> snapshot based recovery and cleanly abort instead of trying and running out
> of space on a node
> ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
> Key: SOLR-7820
> URL: https://issues.apache.org/jira/browse/SOLR-7820
> Project: Solr
> Issue Type: Improvement
> Components: replication (java)
> Reporter: Timothy Potter
>
> When a replica is trying to recover and it's IndexFetcher decides it needs to
> pull the full index from a peer (isFullCopyNeeded == true), then the existing
> index directory should be deleted before the full copy is started to free up
> disk to pull a fresh index, otherwise the server will potentially need 2x the
> disk space (old + incoming new). Currently, the IndexFetcher removes the
> index directory after the new is downloaded; however, once the fetcher
> decides a full copy is needed, what is the value of the existing index? It's
> clearly out-of-date and should not serve queries. Since we're deleting data
> preemptively, maybe this should be an advanced configuration property, only
> to be used by those that are disk-space constrained (which I'm seeing more
> and more with people deploying high-end SSDs - they typically don't have 2x
> the disk capacity required by an index).
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]