[ https://issues.apache.org/jira/browse/SOLR-7989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15000554#comment-15000554 ]
Mark Miller commented on SOLR-7989: ----------------------------------- Hmm...doesn't this test create the problem itself? It puts all bunch of replicas in LIR manually, causing the DOWN state. Then it manually clears LIR and tries to see who will become leader. This is not a normal case? In any case, I will revert the current fix. Now I think the test may be creating the problem itself, but regardless, this change really breaks things. > Down replica elected leader, stays down after successful election > ----------------------------------------------------------------- > > Key: SOLR-7989 > URL: https://issues.apache.org/jira/browse/SOLR-7989 > Project: Solr > Issue Type: Bug > Reporter: Ishan Chattopadhyaya > Assignee: Mark Miller > Fix For: 5.4, Trunk > > Attachments: DownLeaderTest.java, DownLeaderTest.java, > SOLR-7989.patch, SOLR-7989.patch, SOLR-7989.patch, SOLR-7989.patch, > SOLR-7989.patch, SOLR-8233.patch > > > It is possible that a down replica gets elected as a leader, and that it > stays down after the election. > Here's how I hit upon this: > * There are 3 replicas: leader, notleader0, notleader1 > * Introduced network partition to isolate notleader0, notleader1 from leader > (leader puts these two in LIR via zk). > * Kill leader, remove partition. Now leader is dead, and both of notleader0 > and notleader1 are down. There is no leader. > * Remove LIR znodes in zk. > * Wait a while, and there happens a (flawed?) leader election. > * Finally, the state is such that one of notleader0 or notleader1 (which were > down before) become leader, but stays down. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org