[jira] [Commented] (SOLR-8697) Fix LeaderElector issues

Mark Miller (JIRA) Tue, 23 Feb 2016 11:47:47 -0800

    [ 
https://issues.apache.org/jira/browse/SOLR-8697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15159525#comment-15159525
 ]


Mark Miller commented on SOLR-8697:
-----------------------------------

Note: Have not seen any fails like this in this test in a long, long time so 
may be related to this change. Perhaps just a test issue, because this retries 
for like 60 seconds or something.

{noformat}
   [junit4] ERROR   66.3s J4  | OverseerTest.testShardLeaderChange <<<
   [junit4]    > Throwable #1: org.apache.solr.common.SolrException: Could not 
register as the leader because creating the ephemeral registration node in 
ZooKeeper failed
   [junit4]    >        at 
__randomizedtesting.SeedInfo.seed([C4618609907C7E14:1A3201FE8AE48BE5]:0)
   [junit4]    >        at 
org.apache.solr.cloud.ShardLeaderElectionContextBase.runLeaderProcess(ElectionContext.java:212)
   [junit4]    >        at 
org.apache.solr.cloud.LeaderElector.runIamLeaderProcess(LeaderElector.java:173)
   [junit4]    >        at 
org.apache.solr.cloud.LeaderElector.checkIfIamLeader(LeaderElector.java:138)
   [junit4]    >        at 
org.apache.solr.cloud.LeaderElector.joinElection(LeaderElector.java:310)
   [junit4]    >        at 
org.apache.solr.cloud.LeaderElector.joinElection(LeaderElector.java:219)
   [junit4]    >        at 
org.apache.solr.cloud.OverseerTest$MockZKController.publishState(OverseerTest.java:181)
   [junit4]    >        at 
org.apache.solr.cloud.OverseerTest.testShardLeaderChange(OverseerTest.java:841)
   [junit4]    >        at java.lang.Thread.run(Thread.java:745)
   [junit4]    > Caused by: 
org.apache.zookeeper.KeeperException$NodeExistsException: KeeperErrorCode = 
NodeExists
   [junit4]    >        at 
org.apache.zookeeper.KeeperException.create(KeeperException.java:119)
   [junit4]    >        at 
org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:949)
   [junit4]    >        at 
org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:915)
   [junit4]    >        at 
org.apache.solr.common.cloud.SolrZkClient$11.execute(SolrZkClient.java:577)
   [junit4]    >        at 
org.apache.solr.common.cloud.SolrZkClient$11.execute(SolrZkClient.java:574)
   [junit4]    >        at 
org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:60)
   [junit4]    >        at 
org.apache.solr.common.cloud.SolrZkClient.multi(SolrZkClient.java:574)
   [junit4]    >        at 
org.apache.solr.cloud.ShardLeaderElectionContextBase$1.execute(ElectionContext.java:195)
   [junit4]    >        at 
org.apache.solr.common.util.RetryUtil.retryOnThrowable(RetryUtil.java:49)
   [junit4]    >        at 
org.apache.solr.common.util.RetryUtil.retryOnThrowable(RetryUtil.java:42)
   [junit4]    >        at 
org.apache.solr.cloud.ShardLeaderElectionContextBase.runLeaderProcess(ElectionContext.java:178)
   [junit4]    >        ... 45 more
{noformat}

> Fix LeaderElector issues
> ------------------------
>
>                 Key: SOLR-8697
>                 URL: https://issues.apache.org/jira/browse/SOLR-8697
>             Project: Solr
>          Issue Type: Bug
>          Components: SolrCloud
>    Affects Versions: 5.4.1
>            Reporter: Scott Blum
>            Assignee: Mark Miller
>              Labels: patch, reliability, solrcloud
>             Fix For: master
>
>         Attachments: SOLR-8697.patch
>
>
> This patch is still somewhat WIP for a couple of reasons:
> 1) Still debugging test failures.
> 2) This will more scrutiny from knowledgable folks!
> There are some subtle bugs with the current implementation of LeaderElector, 
> best demonstrated by the following test:
> 1) Start up a small single-node solrcloud.  it should be become Overseer.
> 2) kill -9 the solrcloud process and immediately start a new one.
> 3) The new process won't become overseer.  The old process's ZK leader elect 
> node has not yet disappeared, and the new process fails to set appropriate 
> watches.
> NOTE: this is only reproducible if the new node is able to start up and join 
> the election quickly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8697) Fix LeaderElector issues

Reply via email to