[
https://issues.apache.org/jira/browse/CURATOR-435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16194731#comment-16194731
]
Jordan Zimmerman commented on CURATOR-435:
------------------------------------------
That error message is benign. For ZK 3.4.x Curator will create standard parent
ZNodes. In 3.5.x it will create Container nodes. So, this isn't the source of
the problem.
The actual problem looks to be in LeaderLatch#getLeader(). I thought this was
fixed in CURATOR-358 but maybe not.
> ZNodes are created with incompatible CreateMode under racing conditions
> -----------------------------------------------------------------------
>
> Key: CURATOR-435
> URL: https://issues.apache.org/jira/browse/CURATOR-435
> Project: Apache Curator
> Issue Type: Bug
> Components: Client, Recipes
> Affects Versions: 2.12.0
> Environment: Apache Storm 1.1.1 (uses Zookeeper 3.4.6 and Curator
> 2.12.0)
> Apache Zookeeper 3.4.10
> OpenJDK 1.8.0_131
> Debian 8
> Docker 17.09.0-ce
> Rancher 1.6.9
> Reporter: Daniel Klessing
> Attachments: zk_leader-lock_nimbus.log,
> zk_leader-lock_SUCCESS_nimbus.log, zk_leader-lock_SUCCESS_zookeeper.log,
> zk_leader-lock_zookeeper-1.log, zk_leader-lock_zookeeper-2.log,
> zk_leader-lock_zookeeper-3.log
>
>
> We have a microservice-based software stack that starts its core services
> relatively simultaneously. For this we rely on Docker and Rancher.
> We noticed that occasionally the Apache Storm Nimbus, that relies on Apache
> Curator's LeaderLatch, cannot agree on a leader. This comes due a missing
> ZNode:
> {code}
> org.apache.storm.shade.org.apache.zookeeper.KeeperException$NoNodeException:
> KeeperErrorCode = NoNode for /storm/leader-lock
> at
> org.apache.storm.shade.org.apache.zookeeper.KeeperException.create(KeeperException.java:111)
> ~[storm-core-1.1.1.jar:1.1.1]
> at
> org.apache.storm.shade.org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
> ~[storm-core-1.1.1.jar:1.1.1]
> at
> org.apache.storm.shade.org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1590)
> ~[storm-core-1.1.1.jar:1.1.1]
> at
> org.apache.storm.shade.org.apache.curator.framework.imps.GetChildrenBuilderImpl$3.call(GetChildrenBuilderImpl.java:230)
> ~[storm-core-1.1.1.jar:1.1.1]
> at
> org.apache.storm.shade.org.apache.curator.framework.imps.GetChildrenBuilderImpl$3.call(GetChildrenBuilderImpl.java:219)
> ~[storm-core-1.1.1.jar:1.1.1]
> at
> org.apache.storm.shade.org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:109)
> ~[storm-core-1.1.1.jar:1.1.1]
> at
> org.apache.storm.shade.org.apache.curator.framework.imps.GetChildrenBuilderImpl.pathInForeground(GetChildrenBuilderImpl.java:216)
> ~[storm-core-1.1.1.jar:1.1.1]
> at
> org.apache.storm.shade.org.apache.curator.framework.imps.GetChildrenBuilderImpl.forPath(GetChildrenBuilderImpl.java:207)
> ~[storm-core-1.1.1.jar:1.1.1]
> at
> org.apache.storm.shade.org.apache.curator.framework.imps.GetChildrenBuilderImpl.forPath(GetChildrenBuilderImpl.java:40)
> ~[storm-core-1.1.1.jar:1.1.1]
> at
> org.apache.storm.shade.org.apache.curator.framework.recipes.locks.LockInternals.getSortedChildren(LockInternals.java:151)
> ~[storm-core-1.1.1.jar:1.1.1]
> at
> org.apache.storm.shade.org.apache.curator.framework.recipes.locks.LockInternals.getParticipantNodes(LockInternals.java:133)
> ~[storm-core-1.1.1.jar:1.1.1]
> at
> org.apache.storm.shade.org.apache.curator.framework.recipes.leader.LeaderLatch.getLeader(LeaderLatch.java:453)
> ~[storm-core-1.1.1.jar:1.1.1]
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> ~[?:1.8.0_131]
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> ~[?:1.8.0_131]
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> ~[?:1.8.0_131]
> at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_131]
> at clojure.lang.Reflector.invokeMatchingMethod(Reflector.java:93)
> ~[clojure-1.7.0.jar:?]
> at clojure.lang.Reflector.invokeNoArgInstanceMember(Reflector.java:313)
> ~[clojure-1.7.0.jar:?]
> at
> org.apache.storm.zookeeper$zk_leader_elector$reify__1043.getLeader(zookeeper.clj:296)
> ~[storm-core-1.1.1.jar:1.1.1]
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> ~[?:1.8.0_131]
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> ~[?:1.8.0_131]
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> ~[?:1.8.0_131]
> at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_131]
> at clojure.lang.Reflector.invokeMatchingMethod(Reflector.java:93)
> ~[clojure-1.7.0.jar:?]
> at clojure.lang.Reflector.invokeNoArgInstanceMember(Reflector.java:313)
> ~[clojure-1.7.0.jar:?]
> at
> org.apache.storm.daemon.nimbus$get_cluster_info.invoke(nimbus.clj:1544)
> ~[storm-core-1.1.1.jar:1.1.1]
> at
> org.apache.storm.daemon.nimbus$mk_reified_nimbus$reify__10780.getClusterInfo(nimbus.clj:2006)
> ~[storm-core-1.1.1.jar:1.1.1]
> at
> org.apache.storm.generated.Nimbus$Processor$getClusterInfo.getResult(Nimbus.java:3920)
> ~[storm-core-1.1.1.jar:1.1.1]
> at
> org.apache.storm.generated.Nimbus$Processor$getClusterInfo.getResult(Nimbus.java:3904)
> ~[storm-core-1.1.1.jar:1.1.1]
> at
> org.apache.storm.thrift.ProcessFunction.process(ProcessFunction.java:39)
> ~[storm-core-1.1.1.jar:1.1.1]
> at
> org.apache.storm.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
> ~[storm-core-1.1.1.jar:1.1.1]
> at
> org.apache.storm.security.auth.SimpleTransportPlugin$SimpleWrapProcessor.process(SimpleTransportPlugin.java:162)
> ~[storm-core-1.1.1.jar:1.1.1]
> at
> org.apache.storm.thrift.server.AbstractNonblockingServer$FrameBuffer.invoke(AbstractNonblockingServer.java:518)
> ~[storm-core-1.1.1.jar:1.1.1]
> at org.apache.storm.thrift.server.Invocation.run(Invocation.java:18)
> ~[storm-core-1.1.1.jar:1.1.1]
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> [?:1.8.0_131]
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> [?:1.8.0_131]
> at java.lang.Thread.run(Thread.java:748) [?:1.8.0_131]
> 11:29:13.229 [timer] INFO org.apache.storm.daemon.nimbus - not a leader,
> skipping assignments
> 11:29:13.229 [timer] INFO org.apache.storm.daemon.nimbus - not a leader,
> skipping cleanup
> {code}
> even though a create request has been issued for this ZNode:
> {code}
> 2017-10-06 11:27:04,387 [myid:3] - INFO [ProcessThread(sid:3
> cport:-1)::PrepRequestProcessor@648] - Got user-level KeeperException when
> processing sessionid:0x35ef170475e0001 type:create cxid:0x1 zxid:0x100000013
> txntype:-1 reqpath:n/a Error Path:/storm/leader-lock Error:KeeperErrorCode =
> NoNode for /storm/leader-lock
> {code}
> Verifying the Zookeeper data via {{zkCli.sh}} shows that the ZNode is in fact
> not present:
> {code}
> [zk: localhost:2181(CONNECTED) 0] ls /storm
> [assignments, backpressure, nimbuses, logconfigs, storms, errors,
> supervisors, workerbeats, blobstore]
> {code}
> We noticed, that in such a case, the following log entry in the Nimbus log is
> *missing*
> {code}
> [Curator-Framework-0] WARN
> org.apache.storm.shade.org.apache.curator.utils.ZKPaths - The version of
> ZooKeeper being used doesn't support Container nodes. CreateMode.PERSISTENT
> will be used instead.
> {code}
> and if everything works, the log entry is *present*.
> We assume that the correct {{CreateMode}} for the ZNode someone cannot be
> determined (as "Container" ZNodes are only available in Zookeeper 3.5.1
> onwards).
> Please notice the startup phase in which several attempts to connect to
> Zookeeper are not successful until it is fully reachable.
> Full logs are attached. Zookeeper is run as a 3-node cluster in this setup.
> Happens with a single node instance also.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)