[
https://issues.apache.org/jira/browse/STORM-602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14257536#comment-14257536
]
clay teahouse commented on STORM-602:
-------------------------------------
Further information on the issue:
1) When starting the topology, if the hadoop nodes are not available, you get
"worker died" message and HdfsBolt and the entire topology die.
java.lang.RuntimeException: ("Worker died")
......
2) If the topology is running and then the hadoop nodes become unavailable, you
get connection refused error. When hadoop nodes become available, the HdfsBolt
never recovers. It keeps giving the following error:
org.apache.storm.hdfs.bolt.HdfsBolt - write/sync failed.
All datanodes ....... are bad. Aborting...
at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1008)
~[hadoop-hdfs-2.2.0.jar:na]
at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:823)
~[hadoop-hdfs-2.2.0.jar:na]
at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:475)
~[hadoop-hdfs-2.2.0.jar:na]
If you restart the topology, everything is OK and HdfsBolt can write to the
hdfs nodes.
> HdfsBolt dies when the hadoop node is not available
> ---------------------------------------------------
>
> Key: STORM-602
> URL: https://issues.apache.org/jira/browse/STORM-602
> Project: Apache Storm
> Issue Type: Bug
> Components: storm-hdfs
> Affects Versions: 0.9.3
> Environment: Ubuntu 14.04
> Reporter: clay teahouse
>
> When the hadoop nodes are not available, HdfsBolt generates the following run
> time error, and dies and the topology dies with it too.
> 12154 [Thread-50-hdfsBolt2] ERROR backtype.storm.util - Halting process:
> ("Worker died")
> java.lang.RuntimeException: ("Worker died")
> at backtype.storm.util$exit_process_BANG_.doInvoke(util.clj:319)
> [storm-core-0.9.3-SNAPSHOT.jar:0.9.3-SNAPSHOT]
> at clojure.lang.RestFn.invoke(RestFn.java:423) [clojure-1.5.1.jar:na]
> at
> backtype.storm.daemon.worker$fn__4770$fn__4771.invoke(worker.clj:452)
> [storm-core-0.9.3-SNAPSHOT.jar:0.9.3-SNAPSHOT]
> at
> backtype.storm.daemon.executor$mk_executor_data$fn__3287$fn__3288.invoke(executor.clj:239)
> [storm-core-0.9.3-SNAPSHOT.jar:0.9.3-SNAPSHOT]
> at backtype.storm.util$async_loop$fn__458.invoke(util.clj:467)
> [storm-core-0.9.3-SNAPSHOT.jar:0.9.3-SNAPSHOT]
> at clojure.lang.AFn.run(AFn.java:24) [clojure-1.5.1.jar:na]
> at java.lang.Thread.run(Thread.java:745) [na:1.7.0_65]
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)