Raintung Li created SOLR-6498:
---------------------------------

             Summary: LeaderElector sometimes will appear multiple ephemeral 
nodes in the zookeeper
                 Key: SOLR-6498
                 URL: https://issues.apache.org/jira/browse/SOLR-6498
             Project: Solr
          Issue Type: Bug
          Components: SolrCloud
    Affects Versions: 4.6.1
         Environment: linux
            Reporter: Raintung Li


Sometimes overseer_elect/collection_shard_leader_elect election path will 
appear multiple same node different sessionid ephemeral nodes.
ex.
92427566579253248-core_node1-n_0000000032
92427566579253249-core_node1-n_0000000033
I can't trace what it happen. But if that, the result will be the new register 
node can't be elect the leader, we also know the old sessionid ephemeral node 
is invalid, but don't know why it is exist.

And the other issue :
joinElection method:
try {
        leaderSeqPath = zkClient.create(shardsElectZkPath + "/" + id + "-n_", 
null,
            CreateMode.EPHEMERAL_SEQUENTIAL, false);
        context.leaderSeqPath = leaderSeqPath;
        cont = false;
      } catch (ConnectionLossException e) {
        // we don't know if we made our node or not...
        List<String> entries = zkClient.getChildren(shardsElectZkPath, null, 
true);
        
        boolean foundId = false;
        for (String entry : entries) {
          String nodeId = getNodeId(entry);
          if (id.equals(nodeId)) {
            // we did create our node...
            foundId  = true;
            break;
          }
        }
        if (!foundId) {
          cont = true;
          if (tries++ > 20) {
            throw new ZooKeeperException(SolrException.ErrorCode.SERVER_ERROR,
                "", e);
          }
          try {
            Thread.sleep(50);
          } catch (InterruptedException e2) {
            Thread.currentThread().interrupt();
          }
        }

      } 

If meet the ConnectionLossException status, maybe will double create the 
ephemeral sequential node.

For my suggestion, can't trace why create the two ephemeral nodes for the same 
server, but can protect it.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to