[ https://issues.apache.org/jira/browse/SOLR-7989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14991446#comment-14991446 ]
Ishan Chattopadhyaya commented on SOLR-7989: -------------------------------------------- bq. Attached the updated patch, running full suite of tests now. Ran many times now, some test or the other fails. Neither of them reproducible. I think this is good to go in. I just realized that I've written up a very similar test for SOLR-7569 (ForceLeaderTest). One of the tests in that suite does exactly what the DownLeaderTest is doing here. As with the test here, that test too fails without this patch in. Only difference is that in that test, the call to FORCELEADER is clearing the LIR state, whereas the DownLeaderTest here is clearing the LIR state directly. I think we should ignore the test here and when SOLR-7569 goes in, we'll have a test that covers the code path taken in this patch. [~noble.paul], [~markrmil...@gmail.com] Can you please review and commit this? Thanks. > Down replica elected leader, stays down after successful election > ----------------------------------------------------------------- > > Key: SOLR-7989 > URL: https://issues.apache.org/jira/browse/SOLR-7989 > Project: Solr > Issue Type: Bug > Reporter: Ishan Chattopadhyaya > Assignee: Noble Paul > Attachments: DownLeaderTest.java, DownLeaderTest.java, > SOLR-7989.patch, SOLR-7989.patch, SOLR-7989.patch, SOLR-7989.patch, > SOLR-8233.patch > > > It is possible that a down replica gets elected as a leader, and that it > stays down after the election. > Here's how I hit upon this: > * There are 3 replicas: leader, notleader0, notleader1 > * Introduced network partition to isolate notleader0, notleader1 from leader > (leader puts these two in LIR via zk). > * Kill leader, remove partition. Now leader is dead, and both of notleader0 > and notleader1 are down. There is no leader. > * Remove LIR znodes in zk. > * Wait a while, and there happens a (flawed?) leader election. > * Finally, the state is such that one of notleader0 or notleader1 (which were > down before) become leader, but stays down. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org