[
https://issues.apache.org/jira/browse/STORM-1114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Zhuo Liu updated STORM-1114:
----------------------------
Description:
In production for some trident topology, we met the bug that some workers are
trying to create a zk-node that is already existent or delete a zk node that
has already been deleted. This causes the worker process to die.
We dissect the problem and figure out that there exists racing condition in
trident TransactionalState's zk-node create and delete codes.
failure stack trace in worker.log:
{noformat}
Caused by:
org.apache.storm.shade.org.apache.zookeeper.KeeperException$NodeExistsException:
KeeperErrorCode = NodeExists for /ignoreStoredMetadata
at
org.apache.storm.shade.org.apache.zookeeper.KeeperException.create(KeeperException.java:119)
~[storm-core-0.10.1.y.jar:0.10.1.y]
at
org.apache.storm.shade.org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
~[storm-core-0.10.1.y.jar:0.10.1.y]
at
org.apache.storm.shade.org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:783)
~[storm-core-0.10.1.y.jar:0.10.1.y]
at
org.apache.storm.shade.org.apache.curator.framework.imps.CreateBuilderImpl$11.call(CreateBuilderImpl.java:676)
~[storm-core-0.10.1.y.jar:0.10.1.y]
at
org.apache.storm.shade.org.apache.curator.framework.imps.CreateBuilderImpl$11.call(CreateBuilderImpl.java:660)
~[storm-core-0.10.1.y.jar:0.10.1.y]
at
org.apache.storm.shade.org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:107)
~[storm-core-0.10.1.y.jar:0.10.1.y]
at
org.apache.storm.shade.org.apache.curator.framework.imps.CreateBuilderImpl.pathInForeground(CreateBuilderImpl.java:656)
~[storm-core-0.10.1.y.jar:0.10.1.y]
at
org.apache.storm.shade.org.apache.curator.framework.imps.CreateBuilderImpl.protectedPathInForeground(CreateBuilderImpl.java:441)
~[storm-core-0.10.1.y.jar:0.10.1.y]
at
org.apache.storm.shade.org.apache.curator.framework.imps.CreateBuilderImpl.forPath(CreateBuilderImpl.java:431)
~[storm-core-0.10.1.y.jar:0.10.1.y]
at
org.apache.storm.shade.org.apache.curator.framework.imps.CreateBuilderImpl$3.forPath(CreateBuilderImpl.java:239)
~[storm-core-0.10.1.y.jar:0.10.1.y]
at
org.apache.storm.shade.org.apache.curator.framework.imps.CreateBuilderImpl$3.forPath(CreateBuilderImpl.java:193)
~[storm-core-0.10.1.y.jar:0.10.1.y]
at
storm.trident.topology.state.TransactionalState.forPath(TransactionalState.java:83)
~[storm-core-0.10.1.y.jar:0.10.1.y]
at
storm.trident.topology.state.TransactionalState.createNode(TransactionalState.java:100)
~[storm-core-0.10.1.y.jar:0.10.1.y]
at
storm.trident.topology.state.TransactionalState.setData(TransactionalState.java:115)
~[storm-core-0.10.1.y.jar:0.10.1.y]
... 9 more
2015-10-14 18:10:43.786 b.s.util [ERROR] Halting process: ("Worker died")
{noformat}
{noformat}
Caused by:
org.apache.storm.shade.org.apache.zookeeper.KeeperException$NoNodeException:
KeeperErrorCode = NoNode for /rainbowHdfsPath
at
org.apache.storm.shade.org.apache.zookeeper.KeeperException.create(KeeperException.java:111)
~[storm-core-0.10.1.y.jar:0.10.1.y]
at
org.apache.storm.shade.org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
~[storm-core-0.10.1.y.jar:0.10.1.y]
at
org.apache.storm.shade.org.apache.zookeeper.ZooKeeper.delete(ZooKeeper.java:873)
~[storm-core-0.10.1.y.jar:0.10.1.y]
at
org.apache.storm.shade.org.apache.curator.framework.imps.DeleteBuilderImpl$5.call(DeleteBuilderImpl.java:239)
~[storm-core-0.10.1.y.jar:0.10.1.y]
at
org.apache.storm.shade.org.apache.curator.framework.imps.DeleteBuilderImpl$5.call(DeleteBuilderImpl.java:234)
~[storm-core-0.10.1.y.jar:0.10.1.y]
at
org.apache.storm.shade.org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:107)
~[storm-core-0.10.1.y.jar:0.10.1.y]
at
org.apache.storm.shade.org.apache.curator.framework.imps.DeleteBuilderImpl.pathInForeground(DeleteBuilderImpl.java:230)
~[storm-core-0.10.1.y.jar:0.10.1.y]
at
org.apache.storm.shade.org.apache.curator.framework.imps.DeleteBuilderImpl.forPath(DeleteBuilderImpl.java:215)
~[storm-core-0.10.1.y.jar:0.10.1.y]
at
org.apache.storm.shade.org.apache.curator.framework.imps.DeleteBuilderImpl.forPath(DeleteBuilderImpl.java:42)
~[storm-core-0.10.1.y.jar:0.10.1.y]
at
storm.trident.topology.state.TransactionalState.delete(TransactionalState.java:126)
~[storm-core-0.10.1.y.jar:0.10.1.y]
... 12 more
2015-10-14 18:10:28.799 b.s.util [ERROR] Halting process: ("Worker died")
java.lang.RuntimeException: ("Worker died")
{noformat}
was:
In production for some trident topology, we met the bug that some workers are
trying to create a zk-node that is already existent or delete a zk node that
has already been deleted. This causes the worker process to die.
We dissect the problem and figure out that there exists racing condition in
trident TransactionalState's zk-node create and delete codes.
This has to be fixed.
failure stack trace in worker.log:
{noformat}
Caused by:
org.apache.storm.shade.org.apache.zookeeper.KeeperException$NodeExistsException:
KeeperErrorCode = NodeExists for /ignoreStoredMetadata
at
org.apache.storm.shade.org.apache.zookeeper.KeeperException.create(KeeperException.java:119)
~[storm-core-0.10.1.y.jar:0.10.1.y]
at
org.apache.storm.shade.org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
~[storm-core-0.10.1.y.jar:0.10.1.y]
at
org.apache.storm.shade.org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:783)
~[storm-core-0.10.1.y.jar:0.10.1.y]
at
org.apache.storm.shade.org.apache.curator.framework.imps.CreateBuilderImpl$11.call(CreateBuilderImpl.java:676)
~[storm-core-0.10.1.y.jar:0.10.1.y]
at
org.apache.storm.shade.org.apache.curator.framework.imps.CreateBuilderImpl$11.call(CreateBuilderImpl.java:660)
~[storm-core-0.10.1.y.jar:0.10.1.y]
at
org.apache.storm.shade.org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:107)
~[storm-core-0.10.1.y.jar:0.10.1.y]
at
org.apache.storm.shade.org.apache.curator.framework.imps.CreateBuilderImpl.pathInForeground(CreateBuilderImpl.java:656)
~[storm-core-0.10.1.y.jar:0.10.1.y]
at
org.apache.storm.shade.org.apache.curator.framework.imps.CreateBuilderImpl.protectedPathInForeground(CreateBuilderImpl.java:441)
~[storm-core-0.10.1.y.jar:0.10.1.y]
at
org.apache.storm.shade.org.apache.curator.framework.imps.CreateBuilderImpl.forPath(CreateBuilderImpl.java:431)
~[storm-core-0.10.1.y.jar:0.10.1.y]
at
org.apache.storm.shade.org.apache.curator.framework.imps.CreateBuilderImpl$3.forPath(CreateBuilderImpl.java:239)
~[storm-core-0.10.1.y.jar:0.10.1.y]
at
org.apache.storm.shade.org.apache.curator.framework.imps.CreateBuilderImpl$3.forPath(CreateBuilderImpl.java:193)
~[storm-core-0.10.1.y.jar:0.10.1.y]
at
storm.trident.topology.state.TransactionalState.forPath(TransactionalState.java:83)
~[storm-core-0.10.1.y.jar:0.10.1.y]
at
storm.trident.topology.state.TransactionalState.createNode(TransactionalState.java:100)
~[storm-core-0.10.1.y.jar:0.10.1.y]
at
storm.trident.topology.state.TransactionalState.setData(TransactionalState.java:115)
~[storm-core-0.10.1.y.jar:0.10.1.y]
... 9 more
2015-10-14 18:10:43.786 b.s.util [ERROR] Halting process: ("Worker died")
{noformat}
{noformat}
Caused by:
org.apache.storm.shade.org.apache.zookeeper.KeeperException$NoNodeException:
KeeperErrorCode = NoNode for /rainbowHdfsPath
at
org.apache.storm.shade.org.apache.zookeeper.KeeperException.create(KeeperException.java:111)
~[storm-core-0.10.1.y.jar:0.10.1.y]
at
org.apache.storm.shade.org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
~[storm-core-0.10.1.y.jar:0.10.1.y]
at
org.apache.storm.shade.org.apache.zookeeper.ZooKeeper.delete(ZooKeeper.java:873)
~[storm-core-0.10.1.y.jar:0.10.1.y]
at
org.apache.storm.shade.org.apache.curator.framework.imps.DeleteBuilderImpl$5.call(DeleteBuilderImpl.java:239)
~[storm-core-0.10.1.y.jar:0.10.1.y]
at
org.apache.storm.shade.org.apache.curator.framework.imps.DeleteBuilderImpl$5.call(DeleteBuilderImpl.java:234)
~[storm-core-0.10.1.y.jar:0.10.1.y]
at
org.apache.storm.shade.org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:107)
~[storm-core-0.10.1.y.jar:0.10.1.y]
at
org.apache.storm.shade.org.apache.curator.framework.imps.DeleteBuilderImpl.pathInForeground(DeleteBuilderImpl.java:230)
~[storm-core-0.10.1.y.jar:0.10.1.y]
at
org.apache.storm.shade.org.apache.curator.framework.imps.DeleteBuilderImpl.forPath(DeleteBuilderImpl.java:215)
~[storm-core-0.10.1.y.jar:0.10.1.y]
at
org.apache.storm.shade.org.apache.curator.framework.imps.DeleteBuilderImpl.forPath(DeleteBuilderImpl.java:42)
~[storm-core-0.10.1.y.jar:0.10.1.y]
at
storm.trident.topology.state.TransactionalState.delete(TransactionalState.java:126)
~[storm-core-0.10.1.y.jar:0.10.1.y]
... 12 more
2015-10-14 18:10:28.799 b.s.util [ERROR] Halting process: ("Worker died")
java.lang.RuntimeException: ("Worker died")
{noformat}
> Racing condition in trident zookeeper zk-node create/delete
> -----------------------------------------------------------
>
> Key: STORM-1114
> URL: https://issues.apache.org/jira/browse/STORM-1114
> Project: Apache Storm
> Issue Type: Documentation
> Components: storm-core
> Reporter: Zhuo Liu
> Priority: Minor
>
> In production for some trident topology, we met the bug that some workers are
> trying to create a zk-node that is already existent or delete a zk node that
> has already been deleted. This causes the worker process to die.
>
> We dissect the problem and figure out that there exists racing condition in
> trident TransactionalState's zk-node create and delete codes.
> failure stack trace in worker.log:
> {noformat}
> Caused by:
> org.apache.storm.shade.org.apache.zookeeper.KeeperException$NodeExistsException:
> KeeperErrorCode = NodeExists for /ignoreStoredMetadata
> at
> org.apache.storm.shade.org.apache.zookeeper.KeeperException.create(KeeperException.java:119)
> ~[storm-core-0.10.1.y.jar:0.10.1.y]
> at
> org.apache.storm.shade.org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
> ~[storm-core-0.10.1.y.jar:0.10.1.y]
> at
> org.apache.storm.shade.org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:783)
> ~[storm-core-0.10.1.y.jar:0.10.1.y]
> at
> org.apache.storm.shade.org.apache.curator.framework.imps.CreateBuilderImpl$11.call(CreateBuilderImpl.java:676)
> ~[storm-core-0.10.1.y.jar:0.10.1.y]
> at
> org.apache.storm.shade.org.apache.curator.framework.imps.CreateBuilderImpl$11.call(CreateBuilderImpl.java:660)
> ~[storm-core-0.10.1.y.jar:0.10.1.y]
> at
> org.apache.storm.shade.org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:107)
> ~[storm-core-0.10.1.y.jar:0.10.1.y]
> at
> org.apache.storm.shade.org.apache.curator.framework.imps.CreateBuilderImpl.pathInForeground(CreateBuilderImpl.java:656)
> ~[storm-core-0.10.1.y.jar:0.10.1.y]
> at
> org.apache.storm.shade.org.apache.curator.framework.imps.CreateBuilderImpl.protectedPathInForeground(CreateBuilderImpl.java:441)
> ~[storm-core-0.10.1.y.jar:0.10.1.y]
> at
> org.apache.storm.shade.org.apache.curator.framework.imps.CreateBuilderImpl.forPath(CreateBuilderImpl.java:431)
> ~[storm-core-0.10.1.y.jar:0.10.1.y]
> at
> org.apache.storm.shade.org.apache.curator.framework.imps.CreateBuilderImpl$3.forPath(CreateBuilderImpl.java:239)
> ~[storm-core-0.10.1.y.jar:0.10.1.y]
> at
> org.apache.storm.shade.org.apache.curator.framework.imps.CreateBuilderImpl$3.forPath(CreateBuilderImpl.java:193)
> ~[storm-core-0.10.1.y.jar:0.10.1.y]
> at
> storm.trident.topology.state.TransactionalState.forPath(TransactionalState.java:83)
> ~[storm-core-0.10.1.y.jar:0.10.1.y]
> at
> storm.trident.topology.state.TransactionalState.createNode(TransactionalState.java:100)
> ~[storm-core-0.10.1.y.jar:0.10.1.y]
> at
> storm.trident.topology.state.TransactionalState.setData(TransactionalState.java:115)
> ~[storm-core-0.10.1.y.jar:0.10.1.y]
> ... 9 more
> 2015-10-14 18:10:43.786 b.s.util [ERROR] Halting process: ("Worker died")
> {noformat}
> {noformat}
> Caused by:
> org.apache.storm.shade.org.apache.zookeeper.KeeperException$NoNodeException:
> KeeperErrorCode = NoNode for /rainbowHdfsPath
> at
> org.apache.storm.shade.org.apache.zookeeper.KeeperException.create(KeeperException.java:111)
> ~[storm-core-0.10.1.y.jar:0.10.1.y]
> at
> org.apache.storm.shade.org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
> ~[storm-core-0.10.1.y.jar:0.10.1.y]
> at
> org.apache.storm.shade.org.apache.zookeeper.ZooKeeper.delete(ZooKeeper.java:873)
> ~[storm-core-0.10.1.y.jar:0.10.1.y]
> at
> org.apache.storm.shade.org.apache.curator.framework.imps.DeleteBuilderImpl$5.call(DeleteBuilderImpl.java:239)
> ~[storm-core-0.10.1.y.jar:0.10.1.y]
> at
> org.apache.storm.shade.org.apache.curator.framework.imps.DeleteBuilderImpl$5.call(DeleteBuilderImpl.java:234)
> ~[storm-core-0.10.1.y.jar:0.10.1.y]
> at
> org.apache.storm.shade.org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:107)
> ~[storm-core-0.10.1.y.jar:0.10.1.y]
> at
> org.apache.storm.shade.org.apache.curator.framework.imps.DeleteBuilderImpl.pathInForeground(DeleteBuilderImpl.java:230)
> ~[storm-core-0.10.1.y.jar:0.10.1.y]
> at
> org.apache.storm.shade.org.apache.curator.framework.imps.DeleteBuilderImpl.forPath(DeleteBuilderImpl.java:215)
> ~[storm-core-0.10.1.y.jar:0.10.1.y]
> at
> org.apache.storm.shade.org.apache.curator.framework.imps.DeleteBuilderImpl.forPath(DeleteBuilderImpl.java:42)
> ~[storm-core-0.10.1.y.jar:0.10.1.y]
> at
> storm.trident.topology.state.TransactionalState.delete(TransactionalState.java:126)
> ~[storm-core-0.10.1.y.jar:0.10.1.y]
> ... 12 more
> 2015-10-14 18:10:28.799 b.s.util [ERROR] Halting process: ("Worker died")
> java.lang.RuntimeException: ("Worker died")
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)