[ 
https://issues.apache.org/jira/browse/SOLR-8225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15036443#comment-15036443
 ] 

Ayon Sinha commented on SOLR-8225:
----------------------------------

This problem is becoming a deal-breaker for any Solr cluster. The larger the 
cluster becomes, the higher is the likelihood of at least one replica being 
unhealthy/slow/recovering. Right now, as it stands, indexing comes to a 
grinding halt when one or more replicas are recovering. 
To begin this fix, we MUST at least add a setting where leader does not send 
the update to a recovering replica at all. It should get that update from 
wherever its recovering from.

[[email protected]] Can you please comment on the best way to handle this, and 
we can take this on and submit the patch?
This patch with https://issues.apache.org/jira/browse/SOLR-8227 needs to be 
considered together.

> Leader should send update requests to replicas in recovery asynchronously
> -------------------------------------------------------------------------
>
>                 Key: SOLR-8225
>                 URL: https://issues.apache.org/jira/browse/SOLR-8225
>             Project: Solr
>          Issue Type: Improvement
>          Components: SolrCloud
>            Reporter: Timothy Potter
>
> When a replica goes into recovery, the leader still sends docs to that 
> replica while it is recovering. What I'm seeing is that the recovering node 
> is still slow to respond to the leader (at least slower than the healthy 
> replicas). Thus it would be good if the leader could send the updates to the 
> recovering replica asynchronously, i.e. the leader will block as it does 
> today when forwarding updates to healthy / active replicas, but send updates 
> to recovering replicas async, thus preventing the whole update request from 
> being slowed down by a potentially degraded.
> FWIW - I've actually seen this occur in an environment that has more than 3 
> replicas per shard. One of the replicas went into recovery and then was much 
> slower to handle requests than the healthy replicas, but the leader had to 
> wait for the slowest replica.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to