[ 
https://issues.apache.org/jira/browse/STORM-2415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16770086#comment-16770086
 ] 

Stig Rohde Døssing commented on STORM-2415:
-------------------------------------------

Storm (2.x and 1.x) was upgraded to 3.4.13 recently, but the change was rolled 
back due to a regression in Zookeeper. See 
https://issues.apache.org/jira/browse/STORM-3233. It looks like Zookeeper is 
currently running a 3.4.14 release vote, so once that is released we can 
upgrade.

[~ramzi.jamal] Just for completeness, do you have a link to the Zookeeper issue 
causing this? The issue linked in the description for this issue only seems to 
apply to Zookeeper 3.5.x.

> Storm fails to properly handle Zookeeper hosts going down
> ---------------------------------------------------------
>
>                 Key: STORM-2415
>                 URL: https://issues.apache.org/jira/browse/STORM-2415
>             Project: Apache Storm
>          Issue Type: Bug
>          Components: storm-core
>    Affects Versions: 1.0.3
>         Environment: All
>            Reporter: Anthony Milbourne
>            Priority: Major
>
> We run a storm cluster (v.1.0.3) on AWS and have 3 Zookeepers supporting it. 
> Because AWS sometimes terminates VMs, we sometimes lose a Zookeeper instance. 
> When this happens, the hostname cannot be resolved for that zookeeper 
> instance as AWS has taken the VM away. We noticed that in this case storm 
> fails to connect to zookeeper – even though there are still 2 Zookeeper 
> instances running. It fails with an exception something like:
> {noformat}
> java.net.UnknownHostException: zookeeper3
>   at java.net.InetAddress.getAllByName0(InetAddress.java:1280) 
>   at java.net.InetAddress.getAllByName(InetAddress.java:1192) 
>   at java.net.InetAddress.getAllByName(InetAddress.java:1126) 
>   at 
> org.apache.storm.shade.org.apache.zookeeper.client.StaticHostProvider.<init>(StaticHostProvider.java:61)
>  
>   at 
> org.apache.storm.shade.org.apache.zookeeper.ZooKeeper.<init>(ZooKeeper.java:445)
>  
>   at 
> org.apache.storm.shade.org.apache.curator.utils.DefaultZookeeperFactory.newZooKeeper(DefaultZookeeperFactory.java:29)
>  
>   at 
> org.apache.storm.shade.org.apache.curator.framework.imps.CuratorFrameworkImpl$2.newZooKeeper(CuratorFrameworkImpl.java:150)
>  
>   at 
> org.apache.storm.shade.org.apache.curator.HandleHolder$1.getZooKeeper(HandleHolder.java:94)
>  
>   at 
> org.apache.storm.shade.org.apache.curator.HandleHolder.getZooKeeper(HandleHolder.java:55)
>  
>   at 
> org.apache.storm.shade.org.apache.curator.ConnectionState.reset(ConnectionState.java:218)
>  
>   at 
> org.apache.storm.shade.org.apache.curator.ConnectionState.start(ConnectionState.java:103)
>  
>   at 
> org.apache.storm.shade.org.apache.curator.CuratorZookeeperClient.start(CuratorZookeeperClient.java:190)
>  
>   at 
> org.apache.storm.shade.org.apache.curator.framework.imps.CuratorFrameworkImpl.start(CuratorFrameworkImpl.java:259)
>  
>   at org.apache.storm.zookeeper$mk_client.doInvoke(zookeeper.clj:86) 
>   at clojure.lang.RestFn.invoke(RestFn.java:494)
>   at 
> org.apache.storm.cluster_state.zookeeper_state_factory$_mkState.invoke(zookeeper_state_factory.clj:28)
>  
>   at org.apache.storm.cluster_state.zookeeper_state_factory.mkState(Unknown 
> Source) 
>   <SNIP REST OF STACKTRACE>
> {noformat}
> Having done some research it looks like this error is caused by a bug in the 
> Zookeeper client library. There is an issue for it here:
> [https://issues.apache.org/jira/browse/ZOOKEEPER-1576]
> This issue has been resolved in the version 3.5.x branch of Zookeeper. 
> However, after 2.5 years and 3 releases the 3.5.x branch of Zookeeper is 
> still in Alpha .
> Despite the fact that it is in alpha, there is a branch of Curator (v.3.x.x) 
> that uses it, but Storm uses Curator version 2.x.x – possibly because it 
> doesn’t rely on alpha code. So the bug is still unpatched in Storm.
> I realise that an upgrade to alpha code may be too much of a risk, but this 
> problem is a serious issue for those running Storm in a containerised or 
> cloud environment - so perhaps it may be worth considering?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to