[ 
https://issues.apache.org/jira/browse/SOLR-8034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14745737#comment-14745737
 ] 

Anshum Gupta commented on SOLR-8034:
------------------------------------

Thanks for fixing the assert.

bq.replica will not realize it's down on its own since the partition is between 
the leader and the replica, not between the replica and zookeeper – so it won't 
be set to down until the leader tries to forward the document to it and fails

Right, should've realized that.

Also, about my opinion being split, I wasn't in on this, but I thought more and 
it makes more sense to go with this.

Thanks [~mewmewball] . LGTM overall, I'll commit this.

> If minRF is not satisfied, leader should not put replicas in recovery
> ---------------------------------------------------------------------
>
>                 Key: SOLR-8034
>                 URL: https://issues.apache.org/jira/browse/SOLR-8034
>             Project: Solr
>          Issue Type: Bug
>          Components: SolrCloud
>            Reporter: Jessica Cheng Mallet
>              Labels: solrcloud
>         Attachments: SOLR-8034.patch, SOLR-8034.patch
>
>
> If the minimum replication factor parameter (minRf) in a solr update request 
> is not satisfied -- i.e. if the update was not successfully applied on at 
> least n replicas where n >= minRf -- the shard leader should not put the 
> failed replicas in "leader initiated recovery" and the client should retry 
> the update instead.
> This is so that in the scenario were minRf is not satisfied, the failed 
> replicas can still be eligible to become a leader in case of leader failure, 
> since in the client's perspective this update did not succeed.
> This came up from a network partition scenario where the leader becomes 
> sectioned off from its two followers, but they all could still talk to 
> zookeeper. The partitioned leader set its two followers as in leader 
> initiated recovery, so we couldn't just kill off the partitioned node and 
> have a follower take over leadership. For a minRf=1 case, this is the correct 
> behavior because the partitioned leader would have accepted updates that the 
> followers don't have, and therefore we can't switch leadership or we'd lose 
> those updates. However, in the case of minRf=2, solr never accepted any 
> update in the client's point of view, so in fact the partitioned leader 
> doesn't have any accepted update that the followers don't have, and therefore 
> the followers should be eligible to become leaders. Thus, I'm proposing 
> modifying the leader initiated recovery logic to not put the followers in 
> recovery if the minRf parameter is present and is not satisfied.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to