happylu created STORM-1940:
------------------------------
Summary: Storm Topo is auto re-balance after ZK RECONNECTED
Key: STORM-1940
URL: https://issues.apache.org/jira/browse/STORM-1940
Project: Apache Storm
Issue Type: Bug
Affects Versions: 1.0.1
Reporter: happylu
Priority: Critical
I have a Topo with 2 workers at 2 Vm, while ZK RECONNECTED, Storm Topo will be
auto-reblance.
The log show NodeExists for /meta/712285. I guess it cause by: After reconnect
successfully, TridentSpoutCoordinator create this node again, but this node is
already created before the reconnect.
Can we check if node exist first? Or not throw this exception to make whole
Topo re-balance.
{code}
06-29 05:54:37.515
[Thread-151-$spoutcoord-spout-DataKafkaSpout1466801942228-executor[4
4]-SendThread(ip-10-9-255-26.us-west-2.compute.internal:2181)]
shade.org.apache.zookeeper.ClientCnxn [INFO] Session establishment complete on
server ip-10-9-255-26.us-west-2.compute.internal/10.9.255.26:2181, sessionid =
0x7a556eeee8c70ae1, negotiated timeout = 10000
06-29 05:54:37.515
[Thread-151-$spoutcoord-spout-DataKafkaSpout1466801942228-executor[4
4]-EventThread] apache.curator.framework.state.ConnectionStateManager [INFO]
State change: RECONNECTED
06-29 05:54:37.519 [Thread-133-spout-DataKafkaSpout1466801942228-executor[154
154]-SendThread(ip-10-9-255-26.us-west-2.compute.internal:2181)]
org.apache.zookeeper.ClientCnxn [INFO] Session establishment complete on server
ip-10-9-255-26.us-west-2.compute.internal/10.9.255.26:2181, sessionid =
0x7a556eeee8c70ae5, negotiated timeout = 10000
06-29 05:54:37.519 [Thread-133-spout-DataKafkaSpout1466801942228-executor[154
154]-EventThread] org.I0Itec.zkclient.ZkClient [INFO] zookeeper state changed
(SyncConnected)
06-29 05:54:37.524 [Thread-25-spout-DataKafkaSpout1466801942228-executor[156
156]-SendThread(ip-10-9-255-26.us-west-2.compute.internal:2181)]
org.apache.zookeeper.ClientCnxn [INFO] Session establishment complete on server
ip-10-9-255-26.us-west-2.compute.internal/10.9.255.26:2181, sessionid =
0x7a556eeee8c70ae4, negotiated timeout = 10000
06-29 05:54:37.524 [Thread-25-spout-DataKafkaSpout1466801942228-executor[156
156]-EventThread] org.I0Itec.zkclient.ZkClient [INFO] zookeeper state changed
(SyncConnected)
06-29 05:54:37.528
[main-SendThread(ip-10-9-255-26.us-west-2.compute.internal:2181)]
shade.org.apache.zookeeper.ClientCnxn [INFO] Session establishment complete on
server ip-10-9-255-26.us-west-2.compute.internal/10.9.255.26:2181, sessionid =
0x7b556f0cc3a40896, negotiated timeout = 10000
06-29 05:54:37.528 [main-EventThread]
apache.curator.framework.state.ConnectionStateManager [INFO] State change:
RECONNECTED
06-29 05:54:37.528 [Thread-149-spout-DataKafkaSpout1466801942228-executor[160
160]-SendThread(ip-10-9-255-26.us-west-2.compute.internal:2181)]
org.apache.zookeeper.ClientCnxn [INFO] Session establishment complete on server
ip-10-9-255-26.us-west-2.compute.internal/10.9.255.26:2181, sessionid =
0x7a556eeee8c70ae3, negotiated timeout = 10000
06-29 05:54:37.528 [Thread-149-spout-DataKafkaSpout1466801942228-executor[160
160]-EventThread] org.I0Itec.zkclient.ZkClient [INFO] zookeeper state changed
(SyncConnected)
06-29 05:54:37.536
[Thread-151-$spoutcoord-spout-DataKafkaSpout1466801942228-executor[4 4]]
org.apache.storm.util [ERROR] Async loop died!
java.lang.RuntimeException: java.lang.RuntimeException:
org.apache.storm.shade.org.apache.zookeeper.KeeperException$NodeExistsException:
KeeperErrorCode = NodeExists for /meta/712285
at
org.apache.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:452)
~[storm-core-1.0.1.jar:1.0.1]
at
org.apache.storm.utils.DisruptorQueue.consumeBatchWhenAvailable(DisruptorQueue.java:418)
~[storm-core-1.0.1.jar:1.0.1]
at
org.apache.storm.disruptor$consume_batch_when_available.invoke(disruptor.clj:73)
~[storm-core-1.0.1.jar:1.0.1]
at
org.apache.storm.daemon.executor$fn__7953$fn__7966$fn__8019.invoke(executor.clj:847)
~[storm-core-1.0.1.jar:1.0.1]
at org.apache.storm.util$async_loop$fn__625.invoke(util.clj:484)
[storm-core-1.0.1.jar:1.0.1]
at clojure.lang.AFn.run(AFn.java:22) [clojure-1.7.0.jar:?]
at java.lang.Thread.run(Thread.java:745) [?:1.7.0_80]
Caused by: java.lang.RuntimeException:
org.apache.storm.shade.org.apache.zookeeper.KeeperException$NodeExistsException:
KeeperErrorCode = NodeExists for /meta/712285
at
org.apache.storm.trident.topology.state.TransactionalState.setData(TransactionalState.java:119)
~[storm-core-1.0.1.jar:1.0.1]
at
org.apache.storm.trident.topology.state.RotatingTransactionalState.overrideState(RotatingTransactionalState.java:52)
~[storm-core-1.0.1.jar:1.0.1]
at
org.apache.storm.trident.spout.TridentSpoutCoordinator.execute(TridentSpoutCoordinator.java:71)
~[storm-core-1.0.1.jar:1.0.1]
at
org.apache.storm.topology.BasicBoltExecutor.execute(BasicBoltExecutor.java:50)
~[storm-core-1.0.1.jar:1.0.1]
at
org.apache.storm.daemon.executor$fn__7953$tuple_action_fn__7955.invoke(executor.clj:728)
~[storm-core-1.0.1.jar:1.0.1]
at
org.apache.storm.daemon.executor$mk_task_receiver$fn__7874.invoke(executor.clj:461)
~[storm-core-1.0.1.jar:1.0.1]
at
org.apache.storm.disruptor$clojure_handler$reify__7390.onEvent(disruptor.clj:40)
~[storm-core-1.0.1.jar:1.0.1]
at
org.apache.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:439)
~[storm-core-1.0.1.jar:1.0.1]
... 6 more
Caused by:
org.apache.storm.shade.org.apache.zookeeper.KeeperException$NodeExistsException:
KeeperErrorCode = NodeExists for /meta/712285
at
org.apache.storm.shade.org.apache.zookeeper.KeeperException.create(KeeperException.java:119)
~[storm-core-1.0.1.jar:1.0.1]
at
org.apache.storm.shade.org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
~[storm-core-1.0.1.jar:1.0.1]
at
org.apache.storm.shade.org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:783)
~[storm-core-1.0.1.jar:1.0.1]
at
org.apache.storm.shade.org.apache.curator.framework.imps.CreateBuilderImpl$11.call(CreateBuilderImpl.java:721)
~[storm-core-1.0.1.jar:1.0.1]
at
org.apache.storm.shade.org.apache.curator.framework.imps.CreateBuilderImpl$11.call(CreateBuilderImpl.java:704)
~[storm-core-1.0.1.jar:1.0.1]
at
org.apache.storm.shade.org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:108)
~[storm-core-1.0.1.jar:1.0.1]
at
org.apache.storm.shade.org.apache.curator.framework.imps.CreateBuilderImpl.pathInForeground(CreateBuilderImpl.java:701)
~[storm-core-1.0.1.jar:1.0.1]
at
org.apache.storm.shade.org.apache.curator.framework.imps.CreateBuilderImpl.protectedPathInForeground(CreateBuilderImpl.java:477)
~[storm-core-1.0.1.jar:1.0.1]
at
org.apache.storm.shade.org.apache.curator.framework.imps.CreateBuilderImpl.forPath(CreateBuilderImpl.java:467)
~[storm-core-1.0.1.jar:1.0.1]
at
org.apache.storm.shade.org.apache.curator.framework.imps.CreateBuilderImpl.forPath(CreateBuilderImpl.java:44)
~[storm-core-1.0.1.jar:1.0.1]
at
org.apache.storm.trident.topology.state.TransactionalState.forPath(TransactionalState.java:83)
~[storm-core-1.0.1.jar:1.0.1]
at
org.apache.storm.trident.topology.state.TransactionalState.createNode(TransactionalState.java:95)
~[storm-core-1.0.1.jar:1.0.1]
at
org.apache.storm.trident.topology.state.TransactionalState.setData(TransactionalState.java:115)
~[storm-core-1.0.1.jar:1.0.1]
at
org.apache.storm.trident.topology.state.RotatingTransactionalState.overrideState(RotatingTransactionalState.java:52)
~[storm-core-1.0.1.jar:1.0.1]
at
org.apache.storm.trident.spout.TridentSpoutCoordinator.execute(TridentSpoutCoordinator.java:71)
~[storm-core-1.0.1.jar:1.0.1]
at
org.apache.storm.topology.BasicBoltExecutor.execute(BasicBoltExecutor.java:50)
~[storm-core-1.0.1.jar:1.0.1]
at
org.apache.storm.daemon.executor$fn__7953$tuple_action_fn__7955.invoke(executor.clj:728)
~[storm-core-1.0.1.jar:1.0.1]
at
org.apache.storm.daemon.executor$mk_task_receiver$fn__7874.invoke(executor.clj:461)
~[storm-core-1.0.1.jar:1.0.1]
at
org.apache.storm.disruptor$clojure_handler$reify__7390.onEvent(disruptor.clj:40)
~[storm-core-1.0.1.jar:1.0.1]
at
org.apache.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:439)
~[storm-core-1.0.1.jar:1.0.1]
... 6 more
{code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)