Damien DESMARETS created STORM-840:
--------------------------------------
Summary: My supervisor crashes when I kill a topology
Key: STORM-840
URL: https://issues.apache.org/jira/browse/STORM-840
Project: Apache Storm
Issue Type: Bug
Affects Versions: 0.9.4
Environment: I have a test cluster of 3 servers base on Debian.
Each server use a docker running storm inside.
2 servers are only supervisor.
1 server is nimbus+UI+supervisor.
I use Oracle JVM 8u45.
Reporter: Damien DESMARETS
Hello,
I run 3 topologies inside my cluster.
Sometimes, when I kill one of them (not one specific). One supervisor goes down
and restart. After few restart, it become stable.
The topology process is in "Zombie state" in the process list.
In version 0.9.3, all the supervisors crashed and couldn't restart. To resolve
this, I had to "rm -fr <storm-local-dir>/workers/"
So I migrate to 0.9.4 (I thought that was STORM-682).
Now it continues but no all the times, but occasionally.
I have these logs inside supervisor.log:
2015-05-29 15:01:42 b.s.d.supervisor [INFO] Removing code for storm id
nlp-11-1432906756
2015-05-29 15:01:42 b.s.d.supervisor [INFO] Removing code for storm id
nlp-11-1432906756
2015-05-29 15:01:42 b.s.d.supervisor [INFO] Shutting down and clearing state
for id 355af307-fafc-43a8-865d-0dfbf9baee33. Current supervisor time:
1432911702. State: :disallowed, Heartbeat:
#backtype.storm.daemon.common.WorkerHeartbeat{:time-secs 1432911702, :storm-id
"nlp-11-1432906756", :executors #{[2 2] [3 3] [-1 -1] [1 1]}, :port 6700}
2015-05-29 15:01:42 b.s.d.supervisor [INFO] Shutting down and clearing state
for id 355af307-fafc-43a8-865d-0dfbf9baee33. Current supervisor time:
1432911702. State: :disallowed, Heartbeat:
#backtype.storm.daemon.common.WorkerHeartbeat{:time-secs 1432911702, :storm-id
"nlp-11-1432906756", :executors #{[2 2] [3 3] [-1 -1] [1 1]}, :port 6700}
2015-05-29 15:01:42 b.s.d.supervisor [INFO] Shutting down
90f0964b-c48c-4cbc-9d1c-57119c56e99c:355af307-fafc-43a8-865d-0dfbf9baee33
2015-05-29 15:01:42 b.s.d.supervisor [INFO] Shutting down
90f0964b-c48c-4cbc-9d1c-57119c56e99c:355af307-fafc-43a8-865d-0dfbf9baee33
2015-05-29 15:01:42 b.s.event [ERROR] Error when processing event
java.io.IOException: Cannot run program "kill" (in directory "."): error=2, No
such file or directory
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048)
~[na:1.8.0_45]
at java.lang.Runtime.exec(Runtime.java:620) ~[na:1.8.0_45]
at
org.apache.commons.exec.launcher.Java13CommandLauncher.exec(Java13CommandLauncher.java:58)
~[commons-exec-1.1.jar:1.1]
at
org.apache.commons.exec.DefaultExecutor.launch(DefaultExecutor.java:254)
~[commons-exec-1.1.jar:1.1]
at
org.apache.commons.exec.DefaultExecutor.executeInternal(DefaultExecutor.java:319)
~[commons-exec-1.1.jar:1.1]
at
org.apache.commons.exec.DefaultExecutor.execute(DefaultExecutor.java:160)
~[commons-exec-1.1.jar:1.1]
at
org.apache.commons.exec.DefaultExecutor.execute(DefaultExecutor.java:147)
~[commons-exec-1.1.jar:1.1]
at backtype.storm.util$exec_command_BANG_.invoke(util.clj:386)
~[storm-core-0.9.4.jar:0.9.4]
at backtype.storm.util$send_signal_to_process.invoke(util.clj:415)
~[storm-core-0.9.4.jar:0.9.4]
at backtype.storm.util$kill_process_with_sig_term.invoke(util.clj:426)
~[storm-core-0.9.4.jar:0.9.4]
at
backtype.storm.daemon.supervisor$shutdown_worker.invoke(supervisor.clj:197)
~[storm-core-0.9.4.jar:0.9.4]
at
backtype.storm.daemon.supervisor$sync_processes.invoke(supervisor.clj:267)
~[storm-core-0.9.4.jar:0.9.4]
at clojure.lang.AFn.applyToHelper(AFn.java:161) [clojure-1.5.1.jar:na]
at clojure.lang.AFn.applyTo(AFn.java:151) [clojure-1.5.1.jar:na]
at clojure.core$apply.invoke(core.clj:619) ~[clojure-1.5.1.jar:na]
at clojure.core$partial$fn__4190.doInvoke(core.clj:2396)
~[clojure-1.5.1.jar:na]
at clojure.lang.RestFn.invoke(RestFn.java:397) ~[clojure-1.5.1.jar:na]
at backtype.storm.event$event_manager$fn__2809.invoke(event.clj:40)
~[storm-core-0.9.4.jar:0.9.4]
at clojure.lang.AFn.run(AFn.java:24) [clojure-1.5.1.jar:na]
at java.lang.Thread.run(Thread.java:745) [na:1.8.0_45]
Caused by: java.io.IOException: error=2, No such file or directory
at java.lang.UNIXProcess.forkAndExec(Native Method) ~[na:1.8.0_45]
at java.lang.UNIXProcess.<init>(UNIXProcess.java:248) ~[na:1.8.0_45]
at java.lang.ProcessImpl.start(ProcessImpl.java:134) ~[na:1.8.0_45]
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029)
~[na:1.8.0_45]
... 19 common frames omitted
2015-05-29 15:01:42 b.s.event [ERROR] Error when processing event
java.io.IOException: Cannot run program "kill" (in directory "."): error=2, No
such file or directory
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048)
~[na:1.8.0_45]
at java.lang.Runtime.exec(Runtime.java:620) ~[na:1.8.0_45]
at
org.apache.commons.exec.launcher.Java13CommandLauncher.exec(Java13CommandLauncher.java:58)
~[commons-exec-1.1.jar:1.1]
at
org.apache.commons.exec.DefaultExecutor.launch(DefaultExecutor.java:254)
~[commons-exec-1.1.jar:1.1]
at
org.apache.commons.exec.DefaultExecutor.executeInternal(DefaultExecutor.java:319)
~[commons-exec-1.1.jar:1.1]
at
org.apache.commons.exec.DefaultExecutor.execute(DefaultExecutor.java:160)
~[commons-exec-1.1.jar:1.1]
at
org.apache.commons.exec.DefaultExecutor.execute(DefaultExecutor.java:147)
~[commons-exec-1.1.jar:1.1]
at backtype.storm.util$exec_command_BANG_.invoke(util.clj:386)
~[storm-core-0.9.4.jar:0.9.4]
at backtype.storm.util$send_signal_to_process.invoke(util.clj:415)
~[storm-core-0.9.4.jar:0.9.4]
at backtype.storm.util$kill_process_with_sig_term.invoke(util.clj:426)
~[storm-core-0.9.4.jar:0.9.4]
at
backtype.storm.daemon.supervisor$shutdown_worker.invoke(supervisor.clj:197)
~[storm-core-0.9.4.jar:0.9.4]
at
backtype.storm.daemon.supervisor$sync_processes.invoke(supervisor.clj:267)
~[storm-core-0.9.4.jar:0.9.4]
at clojure.lang.AFn.applyToHelper(AFn.java:161) [clojure-1.5.1.jar:na]
at clojure.lang.AFn.applyTo(AFn.java:151) [clojure-1.5.1.jar:na]
at clojure.core$apply.invoke(core.clj:619) ~[clojure-1.5.1.jar:na]
at clojure.core$partial$fn__4190.doInvoke(core.clj:2396)
~[clojure-1.5.1.jar:na]
at clojure.lang.RestFn.invoke(RestFn.java:397) ~[clojure-1.5.1.jar:na]
at backtype.storm.event$event_manager$fn__2809.invoke(event.clj:40)
~[storm-core-0.9.4.jar:0.9.4]
at clojure.lang.AFn.run(AFn.java:24) [clojure-1.5.1.jar:na]
at java.lang.Thread.run(Thread.java:745) [na:1.8.0_45]
Caused by: java.io.IOException: error=2, No such file or directory
at java.lang.UNIXProcess.forkAndExec(Native Method) ~[na:1.8.0_45]
at java.lang.UNIXProcess.<init>(UNIXProcess.java:248) ~[na:1.8.0_45]
at java.lang.ProcessImpl.start(ProcessImpl.java:134) ~[na:1.8.0_45]
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029)
~[na:1.8.0_45]
... 19 common frames omitted
2015-05-29 15:01:42 b.s.util [ERROR] Halting process: ("Error when processing
an event")
java.lang.RuntimeException: ("Error when processing an event")
at backtype.storm.util$exit_process_BANG_.doInvoke(util.clj:325)
[storm-core-0.9.4.jar:0.9.4]
at clojure.lang.RestFn.invoke(RestFn.java:423) [clojure-1.5.1.jar:na]
at backtype.storm.event$event_manager$fn__2809.invoke(event.clj:48)
[storm-core-0.9.4.jar:0.9.4]
at clojure.lang.AFn.run(AFn.java:24) [clojure-1.5.1.jar:na]
at java.lang.Thread.run(Thread.java:745) [na:1.8.0_45]
2015-05-29 15:01:42 b.s.util [ERROR] Halting process: ("Error when processing
an event")
java.lang.RuntimeException: ("Error when processing an event")
at backtype.storm.util$exit_process_BANG_.doInvoke(util.clj:325)
[storm-core-0.9.4.jar:0.9.4]
at clojure.lang.RestFn.invoke(RestFn.java:423) [clojure-1.5.1.jar:na]
at backtype.storm.event$event_manager$fn__2809.invoke(event.clj:48)
[storm-core-0.9.4.jar:0.9.4]
at clojure.lang.AFn.run(AFn.java:24) [clojure-1.5.1.jar:na]
at java.lang.Thread.run(Thread.java:745) [na:1.8.0_45]
2015-05-29 15:01:42 b.s.d.supervisor [INFO] Shutting down supervisor
90f0964b-c48c-4cbc-9d1c-57119c56e99c
2015-05-29 15:01:42 b.s.d.supervisor [INFO] Shutting down supervisor
90f0964b-c48c-4cbc-9d1c-57119c56e99c
2015-05-29 15:01:42 b.s.event [INFO] Event manager interrupted
2015-05-29 15:01:42 b.s.event [INFO] Event manager interrupted
2015-05-29 15:01:53 o.a.s.z.ZooKeeper [INFO] Client
environment:zookeeper.version=3.4.6-1569965, built on 02/20/2014 09:09 GMT
2015-05-29 15:01:53 o.a.s.z.ZooKeeper [INFO] Client
environment:host.name=storm-supervisor-01
2015-05-29 15:01:53 o.a.s.z.ZooKeeper [INFO] Client
environment:java.version=1.8.0_45
2015-05-29 15:01:53 o.a.s.z.ZooKeeper [INFO] Client
environment:java.vendor=Oracle Corporation
2015-05-29 15:01:53 o.a.s.z.ZooKeeper [INFO] Client
environment:java.home=/usr/lib/jvm/jre-8-oracle-x64/jre
2015-05-29 15:01:53 o.a.s.z.ZooKeeper [INFO] Client
environment:java.class.path=/usr/share/apache-storm-0.9.4/lib/zookeeper-3.4.6.jar:/usr/share/apache-storm-0.9.4/lib/hiccup-0.3.6.jar:/usr/share/apache-storm-0.9.4/lib/chill-java-0.3.5.jar:/usr/share/apache-storm-0.9.4/lib/commons-exec-1.1.jar:/usr/share/apache-storm-0.9.4/lib/tools.macro-0.1.0.jar:/usr/share/apache-storm-0.9.4/lib/jgrapht-core-0.9.0.jar:/usr/share/apache-storm-0.9.4/lib/ring-servlet-0.3.11.jar:/usr/share/apache-storm-0.9.4/lib/clout-1.0.1.jar:/usr/share/apache-storm-0.9.4/lib/storm-core-0.9.4.jar:/usr/share/apache-storm-0.9.4/lib/asm-4.0.jar:/usr/share/apache-storm-0.9.4/lib/tools.cli-0.2.4.jar:/usr/share/apache-storm-0.9.4/lib/disruptor-2.10.1.jar:/usr/share/apache-storm-0.9.4/lib/log4j-over-slf4j-1.6.6.jar:/usr/share/apache-storm-0.9.4/lib/clj-time-0.4.1.jar:/usr/share/apache-storm-0.9.4/lib/slf4j-api-1.7.5.jar:/usr/share/apache-storm-0.9.4/lib/clojure-1.5.1.jar:/usr/share/apache-storm-0.9.4/lib/core.incubator-0.1.0.jar:/usr/share/apache-storm-0.9.4/lib/json-simple-1.1.jar:/usr/share/apache-storm-0.9.4/lib/logback-classic-1.0.13.jar:/usr/share/apache-storm-0.9.4/lib/servlet-api-2.5.jar:/usr/share/apache-storm-0.9.4/lib/logback-core-1.0.13.jar:/usr/share/apache-storm-0.9.4/lib/jetty-6.1.26.jar:/usr/share/apache-storm-0.9.4/lib/clj-stacktrace-0.2.2.jar:/usr/share/apache-storm-0.9.4/lib/ring-devel-0.3.11.jar:/usr/share/apache-storm-0.9.4/lib/minlog-1.2.jar:/usr/share/apache-storm-0.9.4/lib/kryo-2.21.jar:/usr/share/apache-storm-0.9.4/lib/compojure-1.1.3.jar:/usr/share/apache-storm-0.9.4/lib/commons-codec-1.6.jar:/usr/share/apache-storm-0.9.4/lib/tools.logging-0.2.3.jar:/usr/share/apache-storm-0.9.4/lib/ring-jetty-adapter-0.3.11.jar:/usr/share/apache-storm-0.9.4/lib/jetty-util-6.1.26.jar:/usr/share/apache-storm-0.9.4/lib/joda-time-2.0.jar:/usr/share/apache-storm-0.9.4/lib/jline-2.11.jar:/usr/share/apache-storm-0.9.4/lib/commons-logging-1.1.3.jar:/usr/share/apache-storm-0.9.4/lib/reflectasm-1.07-shaded.jar:/usr/share/apache-storm-0.9.4/lib/carbonite-1.4.0.jar:/usr/share/apache-storm-0.9.4/lib/snakeyaml-1.11.jar:/usr/share/apache-storm-0.9.4/lib/objenesis-1.2.jar:/usr/share/apache-storm-0.9.4/lib/ring-core-1.1.5.jar:/usr/share/apache-storm-0.9.4/lib/commons-io-2.4.jar:/usr/share/apache-storm-0.9.4/lib/commons-fileupload-1.2.1.jar:/usr/share/apache-storm-0.9.4/lib/math.numeric-tower-0.0.1.jar:/usr/share/apache-storm-0.9.4/lib/commons-lang-2.5.jar:/usr/share/apache-storm-0.9.4/conf
2015-05-29 15:01:53 o.a.s.z.ZooKeeper [INFO] Client
environment:java.library.path=/usr/local/lib:/opt/local/lib:/usr/lib
2015-05-29 15:01:53 o.a.s.z.ZooKeeper [INFO] Client
environment:java.io.tmpdir=/tmp
2015-05-29 15:01:53 o.a.s.z.ZooKeeper [INFO] Client
environment:java.compiler=<NA>
2015-05-29 15:01:53 o.a.s.z.ZooKeeper [INFO] Client environment:os.name=Linux
2015-05-29 15:01:53 o.a.s.z.ZooKeeper [INFO] Client environment:os.arch=amd64
2015-05-29 15:01:53 o.a.s.z.ZooKeeper [INFO] Client
environment:os.version=3.16.0-0.bpo.4-amd64
...
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)