happylu created STORM-1940:
------------------------------

             Summary: Storm Topo is auto re-balance after ZK RECONNECTED
                 Key: STORM-1940
                 URL: https://issues.apache.org/jira/browse/STORM-1940
             Project: Apache Storm
          Issue Type: Bug
    Affects Versions: 1.0.1
            Reporter: happylu
            Priority: Critical


I have a Topo with 2 workers at 2 Vm, while ZK RECONNECTED, Storm Topo will be 
auto-reblance. 
The log show NodeExists for /meta/712285. I guess it cause by: After reconnect 
successfully, TridentSpoutCoordinator create this node again, but this node is 
already created before the reconnect.
 Can we check if node exist first? Or not throw this exception to make whole 
Topo re-balance. 
{code}
06-29 05:54:37.515 
[Thread-151-$spoutcoord-spout-DataKafkaSpout1466801942228-executor[4 
4]-SendThread(ip-10-9-255-26.us-west-2.compute.internal:2181)] 
shade.org.apache.zookeeper.ClientCnxn [INFO] Session establishment complete on 
server ip-10-9-255-26.us-west-2.compute.internal/10.9.255.26:2181, sessionid = 
0x7a556eeee8c70ae1, negotiated timeout = 10000
06-29 05:54:37.515 
[Thread-151-$spoutcoord-spout-DataKafkaSpout1466801942228-executor[4 
4]-EventThread] apache.curator.framework.state.ConnectionStateManager [INFO] 
State change: RECONNECTED
06-29 05:54:37.519 [Thread-133-spout-DataKafkaSpout1466801942228-executor[154 
154]-SendThread(ip-10-9-255-26.us-west-2.compute.internal:2181)] 
org.apache.zookeeper.ClientCnxn [INFO] Session establishment complete on server 
ip-10-9-255-26.us-west-2.compute.internal/10.9.255.26:2181, sessionid = 
0x7a556eeee8c70ae5, negotiated timeout = 10000
06-29 05:54:37.519 [Thread-133-spout-DataKafkaSpout1466801942228-executor[154 
154]-EventThread] org.I0Itec.zkclient.ZkClient [INFO] zookeeper state changed 
(SyncConnected)
06-29 05:54:37.524 [Thread-25-spout-DataKafkaSpout1466801942228-executor[156 
156]-SendThread(ip-10-9-255-26.us-west-2.compute.internal:2181)] 
org.apache.zookeeper.ClientCnxn [INFO] Session establishment complete on server 
ip-10-9-255-26.us-west-2.compute.internal/10.9.255.26:2181, sessionid = 
0x7a556eeee8c70ae4, negotiated timeout = 10000
06-29 05:54:37.524 [Thread-25-spout-DataKafkaSpout1466801942228-executor[156 
156]-EventThread] org.I0Itec.zkclient.ZkClient [INFO] zookeeper state changed 
(SyncConnected)
06-29 05:54:37.528 
[main-SendThread(ip-10-9-255-26.us-west-2.compute.internal:2181)] 
shade.org.apache.zookeeper.ClientCnxn [INFO] Session establishment complete on 
server ip-10-9-255-26.us-west-2.compute.internal/10.9.255.26:2181, sessionid = 
0x7b556f0cc3a40896, negotiated timeout = 10000
06-29 05:54:37.528 [main-EventThread] 
apache.curator.framework.state.ConnectionStateManager [INFO] State change: 
RECONNECTED
06-29 05:54:37.528 [Thread-149-spout-DataKafkaSpout1466801942228-executor[160 
160]-SendThread(ip-10-9-255-26.us-west-2.compute.internal:2181)] 
org.apache.zookeeper.ClientCnxn [INFO] Session establishment complete on server 
ip-10-9-255-26.us-west-2.compute.internal/10.9.255.26:2181, sessionid = 
0x7a556eeee8c70ae3, negotiated timeout = 10000
06-29 05:54:37.528 [Thread-149-spout-DataKafkaSpout1466801942228-executor[160 
160]-EventThread] org.I0Itec.zkclient.ZkClient [INFO] zookeeper state changed 
(SyncConnected)
06-29 05:54:37.536 
[Thread-151-$spoutcoord-spout-DataKafkaSpout1466801942228-executor[4 4]] 
org.apache.storm.util [ERROR] Async loop died!
java.lang.RuntimeException: java.lang.RuntimeException: 
org.apache.storm.shade.org.apache.zookeeper.KeeperException$NodeExistsException:
 KeeperErrorCode = NodeExists for /meta/712285
        at 
org.apache.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:452)
 ~[storm-core-1.0.1.jar:1.0.1]
        at 
org.apache.storm.utils.DisruptorQueue.consumeBatchWhenAvailable(DisruptorQueue.java:418)
 ~[storm-core-1.0.1.jar:1.0.1]
        at 
org.apache.storm.disruptor$consume_batch_when_available.invoke(disruptor.clj:73)
 ~[storm-core-1.0.1.jar:1.0.1]
        at 
org.apache.storm.daemon.executor$fn__7953$fn__7966$fn__8019.invoke(executor.clj:847)
 ~[storm-core-1.0.1.jar:1.0.1]
        at org.apache.storm.util$async_loop$fn__625.invoke(util.clj:484) 
[storm-core-1.0.1.jar:1.0.1]
        at clojure.lang.AFn.run(AFn.java:22) [clojure-1.7.0.jar:?]
        at java.lang.Thread.run(Thread.java:745) [?:1.7.0_80]
Caused by: java.lang.RuntimeException: 
org.apache.storm.shade.org.apache.zookeeper.KeeperException$NodeExistsException:
 KeeperErrorCode = NodeExists for /meta/712285
        at 
org.apache.storm.trident.topology.state.TransactionalState.setData(TransactionalState.java:119)
 ~[storm-core-1.0.1.jar:1.0.1]
        at 
org.apache.storm.trident.topology.state.RotatingTransactionalState.overrideState(RotatingTransactionalState.java:52)
 ~[storm-core-1.0.1.jar:1.0.1]
        at 
org.apache.storm.trident.spout.TridentSpoutCoordinator.execute(TridentSpoutCoordinator.java:71)
 ~[storm-core-1.0.1.jar:1.0.1]
        at 
org.apache.storm.topology.BasicBoltExecutor.execute(BasicBoltExecutor.java:50) 
~[storm-core-1.0.1.jar:1.0.1]
        at 
org.apache.storm.daemon.executor$fn__7953$tuple_action_fn__7955.invoke(executor.clj:728)
 ~[storm-core-1.0.1.jar:1.0.1]
        at 
org.apache.storm.daemon.executor$mk_task_receiver$fn__7874.invoke(executor.clj:461)
 ~[storm-core-1.0.1.jar:1.0.1]
        at 
org.apache.storm.disruptor$clojure_handler$reify__7390.onEvent(disruptor.clj:40)
 ~[storm-core-1.0.1.jar:1.0.1]
        at 
org.apache.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:439)
 ~[storm-core-1.0.1.jar:1.0.1]
        ... 6 more
Caused by: 
org.apache.storm.shade.org.apache.zookeeper.KeeperException$NodeExistsException:
 KeeperErrorCode = NodeExists for /meta/712285
        at 
org.apache.storm.shade.org.apache.zookeeper.KeeperException.create(KeeperException.java:119)
 ~[storm-core-1.0.1.jar:1.0.1]
        at 
org.apache.storm.shade.org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
 ~[storm-core-1.0.1.jar:1.0.1]
        at 
org.apache.storm.shade.org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:783)
 ~[storm-core-1.0.1.jar:1.0.1]
        at 
org.apache.storm.shade.org.apache.curator.framework.imps.CreateBuilderImpl$11.call(CreateBuilderImpl.java:721)
 ~[storm-core-1.0.1.jar:1.0.1]
        at 
org.apache.storm.shade.org.apache.curator.framework.imps.CreateBuilderImpl$11.call(CreateBuilderImpl.java:704)
 ~[storm-core-1.0.1.jar:1.0.1]
        at 
org.apache.storm.shade.org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:108)
 ~[storm-core-1.0.1.jar:1.0.1]
        at 
org.apache.storm.shade.org.apache.curator.framework.imps.CreateBuilderImpl.pathInForeground(CreateBuilderImpl.java:701)
 ~[storm-core-1.0.1.jar:1.0.1]
        at 
org.apache.storm.shade.org.apache.curator.framework.imps.CreateBuilderImpl.protectedPathInForeground(CreateBuilderImpl.java:477)
 ~[storm-core-1.0.1.jar:1.0.1]
        at 
org.apache.storm.shade.org.apache.curator.framework.imps.CreateBuilderImpl.forPath(CreateBuilderImpl.java:467)
 ~[storm-core-1.0.1.jar:1.0.1]
        at 
org.apache.storm.shade.org.apache.curator.framework.imps.CreateBuilderImpl.forPath(CreateBuilderImpl.java:44)
 ~[storm-core-1.0.1.jar:1.0.1]
        at 
org.apache.storm.trident.topology.state.TransactionalState.forPath(TransactionalState.java:83)
 ~[storm-core-1.0.1.jar:1.0.1]
        at 
org.apache.storm.trident.topology.state.TransactionalState.createNode(TransactionalState.java:95)
 ~[storm-core-1.0.1.jar:1.0.1]
        at 
org.apache.storm.trident.topology.state.TransactionalState.setData(TransactionalState.java:115)
 ~[storm-core-1.0.1.jar:1.0.1]
        at 
org.apache.storm.trident.topology.state.RotatingTransactionalState.overrideState(RotatingTransactionalState.java:52)
 ~[storm-core-1.0.1.jar:1.0.1]
        at 
org.apache.storm.trident.spout.TridentSpoutCoordinator.execute(TridentSpoutCoordinator.java:71)
 ~[storm-core-1.0.1.jar:1.0.1]
        at 
org.apache.storm.topology.BasicBoltExecutor.execute(BasicBoltExecutor.java:50) 
~[storm-core-1.0.1.jar:1.0.1]
        at 
org.apache.storm.daemon.executor$fn__7953$tuple_action_fn__7955.invoke(executor.clj:728)
 ~[storm-core-1.0.1.jar:1.0.1]
        at 
org.apache.storm.daemon.executor$mk_task_receiver$fn__7874.invoke(executor.clj:461)
 ~[storm-core-1.0.1.jar:1.0.1]
        at 
org.apache.storm.disruptor$clojure_handler$reify__7390.onEvent(disruptor.clj:40)
 ~[storm-core-1.0.1.jar:1.0.1]
        at 
org.apache.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:439)
 ~[storm-core-1.0.1.jar:1.0.1]
        ... 6 more
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to