[
https://issues.apache.org/jira/browse/SOLR-7844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15103645#comment-15103645
]
Shai Erera commented on SOLR-7844:
----------------------------------
[[email protected]] this seems to break upgrading existing 5x (e.g. 5.3)
clusters to 5.4, unless I missed a "migration" step. If you're doing a rolling
upgrade, such that you take one of the nodes down, replace the JARs to 5.4 and
restart the node, you'll see such exceptions:
{noformat}
org.apache.solr.common.SolrException: Error getting leader from zk for shard
shard1
at org.apache.solr.cloud.ZkController.getLeader(ZkController.java:1034)
at org.apache.solr.cloud.ZkController.register(ZkController.java:940)
at org.apache.solr.cloud.ZkController.register(ZkController.java:883)
at org.apache.solr.core.ZkContainer$2.run(ZkContainer.java:184)
at org.apache.solr.core.ZkContainer.registerInZk(ZkContainer.java:213)
at org.apache.solr.core.CoreContainer.registerCore(CoreContainer.java:696)
at org.apache.solr.core.CoreContainer.create(CoreContainer.java:750)
at org.apache.solr.core.CoreContainer.create(CoreContainer.java:716)
at
org.apache.solr.handler.admin.CoreAdminHandler.handleCreateAction(CoreAdminHandler.java:623)
at
org.apache.solr.handler.admin.CoreAdminHandler.handleRequestInternal(CoreAdminHandler.java:204)
at
org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:184)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:156)
at
org.apache.solr.servlet.HttpSolrCall.handleAdminRequest(HttpSolrCall.java:664)
at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:438)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:222)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:181)
...
Caused by: org.apache.solr.common.SolrException: Could not get leader props
at org.apache.solr.cloud.ZkController.getLeaderProps(ZkController.java:1081)
at org.apache.solr.cloud.ZkController.getLeaderProps(ZkController.java:1045)
at org.apache.solr.cloud.ZkController.getLeader(ZkController.java:1001)
... 35 more
Caused by: org.apache.zookeeper.KeeperException$NoNodeException:
KeeperErrorCode = NoNode for /collections/acg-test-1/leaders/shard1/leader
at org.apache.zookeeper.KeeperException.create(KeeperException.java:111)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1155)
at
org.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkClient.java:345)
at
org.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkClient.java:342)
at
org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:61)
at org.apache.solr.common.cloud.SolrZkClient.getData(SolrZkClient.java:342)
at org.apache.solr.cloud.ZkController.getLeaderProps(ZkController.java:1059)
{noformat}
When the 5.4 nodes come up, they don't find {{/collections/coll/shard/leader1}}
path and fail. I am not quite sure how to recover this though, since the
cluster has a mixture of 5.3 and 5.4 nodes. I cannot create
{{.../shard1/leader}} since {{../shard1}} is an EPHEMERAL node and therefore
can't create child nodes. I am not sure what will happen if I delete
"../shard1" and recreate it as non EPHEMERAL, will the old 5.3 nodes work? I
also need to ensure that the new 5.4 node doesn't become the leader if it
wasn't already.
Perhaps a fix would be for 5.4 to fallback to read the leader info from
"../shard1"? Then when the last 5.3 node is down, the leader will be attempted
by a 5.4 node which will recreate the leader path according to the 5.4 format?
Should this have been a zk version change?
I'd appreciate some guidance here.
> Zookeeper session expiry during shard leader election can cause multiple
> leaders.
> ---------------------------------------------------------------------------------
>
> Key: SOLR-7844
> URL: https://issues.apache.org/jira/browse/SOLR-7844
> Project: Solr
> Issue Type: Bug
> Affects Versions: 4.10.4
> Reporter: Mike Roberts
> Assignee: Mark Miller
> Fix For: 5.4, Trunk
>
> Attachments: SOLR-7844-5x.patch, SOLR-7844.patch, SOLR-7844.patch,
> SOLR-7844.patch, SOLR-7844.patch, SOLR-7844.patch, SOLR-7844.patch,
> SOLR-7844.patch, SOLR-7844.patch, SOLR-7844.patch
>
>
> If the ZooKeeper session expires for a host during shard leader election, the
> ephemeral leader_elect nodes are removed. However the threads that were
> processing the election are still present (and could believe the host won the
> election). They will then incorrectly create leader nodes once a new
> ZooKeeper session is established.
> This introduces a subtle race condition that could cause two hosts to become
> leader.
> Scenario:
> a three machine cluster, all of the machines are restarting at approximately
> the same time.
> The first machine starts, writes a leader_elect ephemeral node, it's the only
> candidate in the election so it wins and starts the leadership process. As it
> knows it has peers, it begins to block waiting for the peers to arrive.
> During this period of blocking[1] the ZK connection drops and the session
> expires.
> A new ZK session is established, and ElectionContext.cancelElection is
> called. Then register() is called and a new set of leader_elect ephemeral
> nodes are created.
> During the period between the ZK session expiring, and new set of
> leader_elect nodes being created the second machine starts.
> It creates its leader_elect ephemeral nodes, as there are no other nodes it
> wins the election and starts the leadership process. As its still missing one
> of its peers, it begins to block waiting for the third machine to join.
> There is now a race between machine1 & machine2, both of whom think they are
> the leader.
> So far, this isn't too bad, because the machine that loses the race will fail
> when it tries to create the /collection/name/leader/shard1 node (as it
> already exists), and will rejoin the election.
> While this is happening, machine3 has started and has queued for leadership
> behind machine2.
> If the loser of the race is machine2, when it rejoins the election it cancels
> the current context, deleting it's leader_elect ephemeral nodes.
> At this point, machine3 believes it has become leader (the watcher it has on
> the leader_elect node fires), and it runs the LeaderElector::checkIfIAmLeader
> method. This method DELETES the current /collection/name/leader/shard1 node,
> then starts the leadership process (as all three machines are now running, it
> does not block to wait).
> So, machine1 won the race with machine2 and declared its leadership and
> created the nodes. However, machine3 has just deleted them, and recreated
> them for itself. So machine1 and machine3 both believe they are the leader.
> I am thinking that the fix should be to cancel & close all election contexts
> immediately on reconnect (we do cancel them, however it's run serially which
> has blocking issues, and just canceling does not cause the wait loop to
> exit). That election context logic already has checks on the closed flag, so
> they should exit if they see it has been closed.
> I'm working on a patch for this.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]