[jira] [Commented] (SOLR-8069) Leader Initiated Recovery can put the replica with the latest data into LIR and a shard will have no leader even on restart.

Ramkumar Aiyengar (JIRA) Fri, 18 Sep 2015 12:03:20 -0700

    [ 
https://issues.apache.org/jira/browse/SOLR-8069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14876156#comment-14876156
 ]


Ramkumar Aiyengar commented on SOLR-8069:
-----------------------------------------

The case we hit was when we cold stopped/started the cloud. This was on 4.10.4, 
so may not be valid now. Let's say you have R1 and R2.

* R1 is the leader and both R1 and R2 are stopped at the same time.
* R2's stops accepting requests but hasn't updated ZK as yet, when R1 sends a 
update to R2, it fails and puts R2 in LIR.
* R2 shuts down first, then R1.
* R1 starts up first, finds it should be the leader.
* R2 decides it should follow and tries to recover.
* R1 decides it can't be leader due to LIR and steps down. But by then R2 is in 
recovery, doesn't step up, and we have no one stepping forward.

> Leader Initiated Recovery can put the replica with the latest data into LIR 
> and a shard will have no leader even on restart.
> ----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: SOLR-8069
>                 URL: https://issues.apache.org/jira/browse/SOLR-8069
>             Project: Solr
>          Issue Type: Bug
>            Reporter: Mark Miller
>         Attachments: SOLR-8069.patch, SOLR-8069.patch
>
>
> I've seen this twice now. Need to work on a test.
> When some issues hit all the replicas at once, you can end up in a situation 
> where the rightful leader was put or put itself into LIR. Even on restart, 
> this rightful leader won't take leadership and you have to manually clear the 
> LIR nodes.
> It seems that if all the replicas participate in election on startup, LIR 
> should just be cleared.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8069) Leader Initiated Recovery can put the replica with the latest data into LIR and a shard will have no leader even on restart.

Reply via email to