[
https://issues.apache.org/jira/browse/SOLR-8225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15036443#comment-15036443
]
Ayon Sinha commented on SOLR-8225:
----------------------------------
This problem is becoming a deal-breaker for any Solr cluster. The larger the
cluster becomes, the higher is the likelihood of at least one replica being
unhealthy/slow/recovering. Right now, as it stands, indexing comes to a
grinding halt when one or more replicas are recovering.
To begin this fix, we MUST at least add a setting where leader does not send
the update to a recovering replica at all. It should get that update from
wherever its recovering from.
[[email protected]] Can you please comment on the best way to handle this, and
we can take this on and submit the patch?
This patch with https://issues.apache.org/jira/browse/SOLR-8227 needs to be
considered together.
> Leader should send update requests to replicas in recovery asynchronously
> -------------------------------------------------------------------------
>
> Key: SOLR-8225
> URL: https://issues.apache.org/jira/browse/SOLR-8225
> Project: Solr
> Issue Type: Improvement
> Components: SolrCloud
> Reporter: Timothy Potter
>
> When a replica goes into recovery, the leader still sends docs to that
> replica while it is recovering. What I'm seeing is that the recovering node
> is still slow to respond to the leader (at least slower than the healthy
> replicas). Thus it would be good if the leader could send the updates to the
> recovering replica asynchronously, i.e. the leader will block as it does
> today when forwarding updates to healthy / active replicas, but send updates
> to recovering replicas async, thus preventing the whole update request from
> being slowed down by a potentially degraded.
> FWIW - I've actually seen this occur in an environment that has more than 3
> replicas per shard. One of the replicas went into recovery and then was much
> slower to handle requests than the healthy replicas, but the leader had to
> wait for the slowest replica.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]