[jira] [Commented] (SOLR-7989) Down replica elected leader, stays down after successful election

Ishan Chattopadhyaya (JIRA) Thu, 12 Nov 2015 07:33:37 -0800

    [ 
https://issues.apache.org/jira/browse/SOLR-7989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15002241#comment-15002241
 ]


Ishan Chattopadhyaya commented on SOLR-7989:
--------------------------------------------

bq.  I don't think this is a valid test at all. Why does it clear LIR while 
Solr is running? That is not a valid real world scenario.
Indeed, that's why I didn't include it in the patch. I just wanted to bring out 
that there is a problem, but then I couldn't replicate it in a test in any 
other way that represents a real world situation. However, just that fact that 
there were two messages passed as one (LEADER message had a state param) 
indicated to me that we should, in theory, split them out.

+1 to your patch, I hadn't considered the core registration scenario. Also, the 
change from only DOWN to ACTIVE makes sense, and excluding the RECOVERING to 
ACTIVE change.

> Down replica elected leader, stays down after successful election
> -----------------------------------------------------------------
>
>                 Key: SOLR-7989
>                 URL: https://issues.apache.org/jira/browse/SOLR-7989
>             Project: Solr
>          Issue Type: Bug
>            Reporter: Ishan Chattopadhyaya
>            Assignee: Mark Miller
>             Fix For: 5.4, Trunk
>
>         Attachments: DownLeaderTest.java, DownLeaderTest.java, 
> SOLR-7989.patch, SOLR-7989.patch, SOLR-7989.patch, SOLR-7989.patch, 
> SOLR-7989.patch, SOLR-7989.patch, SOLR-7989.patch, SOLR-8233.patch
>
>
> It is possible that a down replica gets elected as a leader, and that it 
> stays down after the election.
> Here's how I hit upon this:
> * There are 3 replicas: leader, notleader0, notleader1
> * Introduced network partition to isolate notleader0, notleader1 from leader 
> (leader puts these two in LIR via zk).
> * Kill leader, remove partition. Now leader is dead, and both of notleader0 
> and notleader1 are down. There is no leader.
> * Remove LIR znodes in zk.
> * Wait a while, and there happens a (flawed?) leader election.
> * Finally, the state is such that one of notleader0 or notleader1 (which were 
> down before) become leader, but stays down.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-7989) Down replica elected leader, stays down after successful election

Reply via email to