[ https://issues.apache.org/jira/browse/SOLR-7989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15002241#comment-15002241 ]
Ishan Chattopadhyaya commented on SOLR-7989: -------------------------------------------- bq. I don't think this is a valid test at all. Why does it clear LIR while Solr is running? That is not a valid real world scenario. Indeed, that's why I didn't include it in the patch. I just wanted to bring out that there is a problem, but then I couldn't replicate it in a test in any other way that represents a real world situation. However, just that fact that there were two messages passed as one (LEADER message had a state param) indicated to me that we should, in theory, split them out. +1 to your patch, I hadn't considered the core registration scenario. Also, the change from only DOWN to ACTIVE makes sense, and excluding the RECOVERING to ACTIVE change. > Down replica elected leader, stays down after successful election > ----------------------------------------------------------------- > > Key: SOLR-7989 > URL: https://issues.apache.org/jira/browse/SOLR-7989 > Project: Solr > Issue Type: Bug > Reporter: Ishan Chattopadhyaya > Assignee: Mark Miller > Fix For: 5.4, Trunk > > Attachments: DownLeaderTest.java, DownLeaderTest.java, > SOLR-7989.patch, SOLR-7989.patch, SOLR-7989.patch, SOLR-7989.patch, > SOLR-7989.patch, SOLR-7989.patch, SOLR-7989.patch, SOLR-8233.patch > > > It is possible that a down replica gets elected as a leader, and that it > stays down after the election. > Here's how I hit upon this: > * There are 3 replicas: leader, notleader0, notleader1 > * Introduced network partition to isolate notleader0, notleader1 from leader > (leader puts these two in LIR via zk). > * Kill leader, remove partition. Now leader is dead, and both of notleader0 > and notleader1 are down. There is no leader. > * Remove LIR znodes in zk. > * Wait a while, and there happens a (flawed?) leader election. > * Finally, the state is such that one of notleader0 or notleader1 (which were > down before) become leader, but stays down. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org