[ 
https://issues.apache.org/jira/browse/SOLR-8697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15154976#comment-15154976
 ] 

Mark Miller commented on SOLR-8697:
-----------------------------------

bq. cancelElection() and runLeaderProcess() can race with each other. If the 
local process is trying to cancel right as it becomes leader, cancelElection() 
won't see a leaderZkNodeParentVersion yet, so it won't try to delete the leader 
registration. Meanwhile, runLeaderProcess() still succeeds in creating the 
leader registration. The call to super.cancelElection() does remove us from the 
queue, but the dead leader registration is left there.

Any thoughts on why the existing stress tests for leader election can't catch 
this? Can we beef something up?

> Fix LeaderElector issues
> ------------------------
>
>                 Key: SOLR-8697
>                 URL: https://issues.apache.org/jira/browse/SOLR-8697
>             Project: Solr
>          Issue Type: Bug
>          Components: SolrCloud
>    Affects Versions: 5.4.1
>            Reporter: Scott Blum
>            Assignee: Mark Miller
>              Labels: patch, reliability, solrcloud
>         Attachments: SOLR-8697.patch
>
>
> This patch is still somewhat WIP for a couple of reasons:
> 1) Still debugging test failures.
> 2) This will more scrutiny from knowledgable folks!
> There are some subtle bugs with the current implementation of LeaderElector, 
> best demonstrated by the following test:
> 1) Start up a small single-node solrcloud.  it should be become Overseer.
> 2) kill -9 the solrcloud process and immediately start a new one.
> 3) The new process won't become overseer.  The old process's ZK leader elect 
> node has not yet disappeared, and the new process fails to set appropriate 
> watches.
> NOTE: this is only reproducible if the new node is able to start up and join 
> the election quickly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to