[ https://issues.apache.org/jira/browse/SOLR-7989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14992163#comment-14992163 ]
Mark Miller commented on SOLR-7989: ----------------------------------- I think I was just hitting this pretty consistently while working on a new ChaosMonkey type test after fixing SOLR-8244. Had a live and down replica at the end that would never recover. Updated to this commit and it no longer happens. > Down replica elected leader, stays down after successful election > ----------------------------------------------------------------- > > Key: SOLR-7989 > URL: https://issues.apache.org/jira/browse/SOLR-7989 > Project: Solr > Issue Type: Bug > Reporter: Ishan Chattopadhyaya > Assignee: Noble Paul > Fix For: 5.4, Trunk > > Attachments: DownLeaderTest.java, DownLeaderTest.java, > SOLR-7989.patch, SOLR-7989.patch, SOLR-7989.patch, SOLR-7989.patch, > SOLR-8233.patch > > > It is possible that a down replica gets elected as a leader, and that it > stays down after the election. > Here's how I hit upon this: > * There are 3 replicas: leader, notleader0, notleader1 > * Introduced network partition to isolate notleader0, notleader1 from leader > (leader puts these two in LIR via zk). > * Kill leader, remove partition. Now leader is dead, and both of notleader0 > and notleader1 are down. There is no leader. > * Remove LIR znodes in zk. > * Wait a while, and there happens a (flawed?) leader election. > * Finally, the state is such that one of notleader0 or notleader1 (which were > down before) become leader, but stays down. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org