[ https://issues.apache.org/jira/browse/STORM-840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14602511#comment-14602511 ]
caofangkun commented on STORM-840: ---------------------------------- {code} java.io.IOException: Cannot run program "kill" (in directory "."): error=2, No such file or directory {code} looks like unable to find "kill" command . Clould you find "kill" in your docker instance? Like: {code} [root@3c46f330f114 /usr/share]# which kill /usr/bin/kill {code} > My supervisor crashes when I kill a topology > -------------------------------------------- > > Key: STORM-840 > URL: https://issues.apache.org/jira/browse/STORM-840 > Project: Apache Storm > Issue Type: Bug > Affects Versions: 0.9.4 > Environment: I have a test cluster of 3 servers base on Debian. > Each server use a docker running storm inside. > 2 servers are only supervisor. > 1 server is nimbus+UI+supervisor. > I use Oracle JVM 8u45. > Reporter: Damien DESMARETS > Labels: crash, stability > > Hello, > I run 3 topologies inside my cluster. > Sometimes, when I kill one of them (not one specific). One supervisor goes > down and restart. After few restart, it become stable. > The topology process is in "Zombie state" in the process list. > In version 0.9.3, all the supervisors crashed and couldn't restart. To > resolve this, I had to "rm -fr <storm-local-dir>/workers/" > So I migrate to 0.9.4 (I thought that was STORM-682). > Now it continues but no all the times, but occasionally. > I have these logs inside supervisor.log: > 2015-05-29 15:01:42 b.s.d.supervisor [INFO] Removing code for storm id > nlp-11-1432906756 > 2015-05-29 15:01:42 b.s.d.supervisor [INFO] Removing code for storm id > nlp-11-1432906756 > 2015-05-29 15:01:42 b.s.d.supervisor [INFO] Shutting down and clearing state > for id 355af307-fafc-43a8-865d-0dfbf9baee33. Current supervisor time: > 1432911702. State: :disallowed, Heartbeat: > #backtype.storm.daemon.common.WorkerHeartbeat{:time-secs 1432911702, > :storm-id "nlp-11-1432906756", :executors #{[2 2] [3 3] [-1 -1] [1 1]}, :port > 6700} > 2015-05-29 15:01:42 b.s.d.supervisor [INFO] Shutting down and clearing state > for id 355af307-fafc-43a8-865d-0dfbf9baee33. Current supervisor time: > 1432911702. State: :disallowed, Heartbeat: > #backtype.storm.daemon.common.WorkerHeartbeat{:time-secs 1432911702, > :storm-id "nlp-11-1432906756", :executors #{[2 2] [3 3] [-1 -1] [1 1]}, :port > 6700} > 2015-05-29 15:01:42 b.s.d.supervisor [INFO] Shutting down > 90f0964b-c48c-4cbc-9d1c-57119c56e99c:355af307-fafc-43a8-865d-0dfbf9baee33 > 2015-05-29 15:01:42 b.s.d.supervisor [INFO] Shutting down > 90f0964b-c48c-4cbc-9d1c-57119c56e99c:355af307-fafc-43a8-865d-0dfbf9baee33 > 2015-05-29 15:01:42 b.s.event [ERROR] Error when processing event > java.io.IOException: Cannot run program "kill" (in directory "."): error=2, > No such file or directory > at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048) > ~[na:1.8.0_45] > at java.lang.Runtime.exec(Runtime.java:620) ~[na:1.8.0_45] > at > org.apache.commons.exec.launcher.Java13CommandLauncher.exec(Java13CommandLauncher.java:58) > ~[commons-exec-1.1.jar:1.1] > at > org.apache.commons.exec.DefaultExecutor.launch(DefaultExecutor.java:254) > ~[commons-exec-1.1.jar:1.1] > at > org.apache.commons.exec.DefaultExecutor.executeInternal(DefaultExecutor.java:319) > ~[commons-exec-1.1.jar:1.1] > at > org.apache.commons.exec.DefaultExecutor.execute(DefaultExecutor.java:160) > ~[commons-exec-1.1.jar:1.1] > at > org.apache.commons.exec.DefaultExecutor.execute(DefaultExecutor.java:147) > ~[commons-exec-1.1.jar:1.1] > at backtype.storm.util$exec_command_BANG_.invoke(util.clj:386) > ~[storm-core-0.9.4.jar:0.9.4] > at backtype.storm.util$send_signal_to_process.invoke(util.clj:415) > ~[storm-core-0.9.4.jar:0.9.4] > at > backtype.storm.util$kill_process_with_sig_term.invoke(util.clj:426) > ~[storm-core-0.9.4.jar:0.9.4] > at > backtype.storm.daemon.supervisor$shutdown_worker.invoke(supervisor.clj:197) > ~[storm-core-0.9.4.jar:0.9.4] > at > backtype.storm.daemon.supervisor$sync_processes.invoke(supervisor.clj:267) > ~[storm-core-0.9.4.jar:0.9.4] > at clojure.lang.AFn.applyToHelper(AFn.java:161) [clojure-1.5.1.jar:na] > at clojure.lang.AFn.applyTo(AFn.java:151) [clojure-1.5.1.jar:na] > at clojure.core$apply.invoke(core.clj:619) ~[clojure-1.5.1.jar:na] > at clojure.core$partial$fn__4190.doInvoke(core.clj:2396) > ~[clojure-1.5.1.jar:na] > at clojure.lang.RestFn.invoke(RestFn.java:397) ~[clojure-1.5.1.jar:na] > at backtype.storm.event$event_manager$fn__2809.invoke(event.clj:40) > ~[storm-core-0.9.4.jar:0.9.4] > at clojure.lang.AFn.run(AFn.java:24) [clojure-1.5.1.jar:na] > at java.lang.Thread.run(Thread.java:745) [na:1.8.0_45] > Caused by: java.io.IOException: error=2, No such file or directory > at java.lang.UNIXProcess.forkAndExec(Native Method) ~[na:1.8.0_45] > at java.lang.UNIXProcess.<init>(UNIXProcess.java:248) ~[na:1.8.0_45] > at java.lang.ProcessImpl.start(ProcessImpl.java:134) ~[na:1.8.0_45] > at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029) > ~[na:1.8.0_45] > ... 19 common frames omitted > 2015-05-29 15:01:42 b.s.event [ERROR] Error when processing event > java.io.IOException: Cannot run program "kill" (in directory "."): error=2, > No such file or directory > at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048) > ~[na:1.8.0_45] > at java.lang.Runtime.exec(Runtime.java:620) ~[na:1.8.0_45] > at > org.apache.commons.exec.launcher.Java13CommandLauncher.exec(Java13CommandLauncher.java:58) > ~[commons-exec-1.1.jar:1.1] > at > org.apache.commons.exec.DefaultExecutor.launch(DefaultExecutor.java:254) > ~[commons-exec-1.1.jar:1.1] > at > org.apache.commons.exec.DefaultExecutor.executeInternal(DefaultExecutor.java:319) > ~[commons-exec-1.1.jar:1.1] > at > org.apache.commons.exec.DefaultExecutor.execute(DefaultExecutor.java:160) > ~[commons-exec-1.1.jar:1.1] > at > org.apache.commons.exec.DefaultExecutor.execute(DefaultExecutor.java:147) > ~[commons-exec-1.1.jar:1.1] > at backtype.storm.util$exec_command_BANG_.invoke(util.clj:386) > ~[storm-core-0.9.4.jar:0.9.4] > at backtype.storm.util$send_signal_to_process.invoke(util.clj:415) > ~[storm-core-0.9.4.jar:0.9.4] > at > backtype.storm.util$kill_process_with_sig_term.invoke(util.clj:426) > ~[storm-core-0.9.4.jar:0.9.4] > at > backtype.storm.daemon.supervisor$shutdown_worker.invoke(supervisor.clj:197) > ~[storm-core-0.9.4.jar:0.9.4] > at > backtype.storm.daemon.supervisor$sync_processes.invoke(supervisor.clj:267) > ~[storm-core-0.9.4.jar:0.9.4] > at clojure.lang.AFn.applyToHelper(AFn.java:161) [clojure-1.5.1.jar:na] > at clojure.lang.AFn.applyTo(AFn.java:151) [clojure-1.5.1.jar:na] > at clojure.core$apply.invoke(core.clj:619) ~[clojure-1.5.1.jar:na] > at clojure.core$partial$fn__4190.doInvoke(core.clj:2396) > ~[clojure-1.5.1.jar:na] > at clojure.lang.RestFn.invoke(RestFn.java:397) ~[clojure-1.5.1.jar:na] > at backtype.storm.event$event_manager$fn__2809.invoke(event.clj:40) > ~[storm-core-0.9.4.jar:0.9.4] > at clojure.lang.AFn.run(AFn.java:24) [clojure-1.5.1.jar:na] > at java.lang.Thread.run(Thread.java:745) [na:1.8.0_45] > Caused by: java.io.IOException: error=2, No such file or directory > at java.lang.UNIXProcess.forkAndExec(Native Method) ~[na:1.8.0_45] > at java.lang.UNIXProcess.<init>(UNIXProcess.java:248) ~[na:1.8.0_45] > at java.lang.ProcessImpl.start(ProcessImpl.java:134) ~[na:1.8.0_45] > at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029) > ~[na:1.8.0_45] > ... 19 common frames omitted > 2015-05-29 15:01:42 b.s.util [ERROR] Halting process: ("Error when processing > an event") > java.lang.RuntimeException: ("Error when processing an event") > at backtype.storm.util$exit_process_BANG_.doInvoke(util.clj:325) > [storm-core-0.9.4.jar:0.9.4] > at clojure.lang.RestFn.invoke(RestFn.java:423) [clojure-1.5.1.jar:na] > at backtype.storm.event$event_manager$fn__2809.invoke(event.clj:48) > [storm-core-0.9.4.jar:0.9.4] > at clojure.lang.AFn.run(AFn.java:24) [clojure-1.5.1.jar:na] > at java.lang.Thread.run(Thread.java:745) [na:1.8.0_45] > 2015-05-29 15:01:42 b.s.util [ERROR] Halting process: ("Error when processing > an event") > java.lang.RuntimeException: ("Error when processing an event") > at backtype.storm.util$exit_process_BANG_.doInvoke(util.clj:325) > [storm-core-0.9.4.jar:0.9.4] > at clojure.lang.RestFn.invoke(RestFn.java:423) [clojure-1.5.1.jar:na] > at backtype.storm.event$event_manager$fn__2809.invoke(event.clj:48) > [storm-core-0.9.4.jar:0.9.4] > at clojure.lang.AFn.run(AFn.java:24) [clojure-1.5.1.jar:na] > at java.lang.Thread.run(Thread.java:745) [na:1.8.0_45] > 2015-05-29 15:01:42 b.s.d.supervisor [INFO] Shutting down supervisor > 90f0964b-c48c-4cbc-9d1c-57119c56e99c > 2015-05-29 15:01:42 b.s.d.supervisor [INFO] Shutting down supervisor > 90f0964b-c48c-4cbc-9d1c-57119c56e99c > 2015-05-29 15:01:42 b.s.event [INFO] Event manager interrupted > 2015-05-29 15:01:42 b.s.event [INFO] Event manager interrupted > 2015-05-29 15:01:53 o.a.s.z.ZooKeeper [INFO] Client > environment:zookeeper.version=3.4.6-1569965, built on 02/20/2014 09:09 GMT > 2015-05-29 15:01:53 o.a.s.z.ZooKeeper [INFO] Client > environment:host.name=storm-supervisor-01 > 2015-05-29 15:01:53 o.a.s.z.ZooKeeper [INFO] Client > environment:java.version=1.8.0_45 > 2015-05-29 15:01:53 o.a.s.z.ZooKeeper [INFO] Client > environment:java.vendor=Oracle Corporation > 2015-05-29 15:01:53 o.a.s.z.ZooKeeper [INFO] Client > environment:java.home=/usr/lib/jvm/jre-8-oracle-x64/jre > 2015-05-29 15:01:53 o.a.s.z.ZooKeeper [INFO] Client > environment:java.class.path=/usr/share/apache-storm-0.9.4/lib/zookeeper-3.4.6.jar:/usr/share/apache-storm-0.9.4/lib/hiccup-0.3.6.jar:/usr/share/apache-storm-0.9.4/lib/chill-java-0.3.5.jar:/usr/share/apache-storm-0.9.4/lib/commons-exec-1.1.jar:/usr/share/apache-storm-0.9.4/lib/tools.macro-0.1.0.jar:/usr/share/apache-storm-0.9.4/lib/jgrapht-core-0.9.0.jar:/usr/share/apache-storm-0.9.4/lib/ring-servlet-0.3.11.jar:/usr/share/apache-storm-0.9.4/lib/clout-1.0.1.jar:/usr/share/apache-storm-0.9.4/lib/storm-core-0.9.4.jar:/usr/share/apache-storm-0.9.4/lib/asm-4.0.jar:/usr/share/apache-storm-0.9.4/lib/tools.cli-0.2.4.jar:/usr/share/apache-storm-0.9.4/lib/disruptor-2.10.1.jar:/usr/share/apache-storm-0.9.4/lib/log4j-over-slf4j-1.6.6.jar:/usr/share/apache-storm-0.9.4/lib/clj-time-0.4.1.jar:/usr/share/apache-storm-0.9.4/lib/slf4j-api-1.7.5.jar:/usr/share/apache-storm-0.9.4/lib/clojure-1.5.1.jar:/usr/share/apache-storm-0.9.4/lib/core.incubator-0.1.0.jar:/usr/share/apache-storm-0.9.4/lib/json-simple-1.1.jar:/usr/share/apache-storm-0.9.4/lib/logback-classic-1.0.13.jar:/usr/share/apache-storm-0.9.4/lib/servlet-api-2.5.jar:/usr/share/apache-storm-0.9.4/lib/logback-core-1.0.13.jar:/usr/share/apache-storm-0.9.4/lib/jetty-6.1.26.jar:/usr/share/apache-storm-0.9.4/lib/clj-stacktrace-0.2.2.jar:/usr/share/apache-storm-0.9.4/lib/ring-devel-0.3.11.jar:/usr/share/apache-storm-0.9.4/lib/minlog-1.2.jar:/usr/share/apache-storm-0.9.4/lib/kryo-2.21.jar:/usr/share/apache-storm-0.9.4/lib/compojure-1.1.3.jar:/usr/share/apache-storm-0.9.4/lib/commons-codec-1.6.jar:/usr/share/apache-storm-0.9.4/lib/tools.logging-0.2.3.jar:/usr/share/apache-storm-0.9.4/lib/ring-jetty-adapter-0.3.11.jar:/usr/share/apache-storm-0.9.4/lib/jetty-util-6.1.26.jar:/usr/share/apache-storm-0.9.4/lib/joda-time-2.0.jar:/usr/share/apache-storm-0.9.4/lib/jline-2.11.jar:/usr/share/apache-storm-0.9.4/lib/commons-logging-1.1.3.jar:/usr/share/apache-storm-0.9.4/lib/reflectasm-1.07-shaded.jar:/usr/share/apache-storm-0.9.4/lib/carbonite-1.4.0.jar:/usr/share/apache-storm-0.9.4/lib/snakeyaml-1.11.jar:/usr/share/apache-storm-0.9.4/lib/objenesis-1.2.jar:/usr/share/apache-storm-0.9.4/lib/ring-core-1.1.5.jar:/usr/share/apache-storm-0.9.4/lib/commons-io-2.4.jar:/usr/share/apache-storm-0.9.4/lib/commons-fileupload-1.2.1.jar:/usr/share/apache-storm-0.9.4/lib/math.numeric-tower-0.0.1.jar:/usr/share/apache-storm-0.9.4/lib/commons-lang-2.5.jar:/usr/share/apache-storm-0.9.4/conf > 2015-05-29 15:01:53 o.a.s.z.ZooKeeper [INFO] Client > environment:java.library.path=/usr/local/lib:/opt/local/lib:/usr/lib > 2015-05-29 15:01:53 o.a.s.z.ZooKeeper [INFO] Client > environment:java.io.tmpdir=/tmp > 2015-05-29 15:01:53 o.a.s.z.ZooKeeper [INFO] Client > environment:java.compiler=<NA> > 2015-05-29 15:01:53 o.a.s.z.ZooKeeper [INFO] Client environment:os.name=Linux > 2015-05-29 15:01:53 o.a.s.z.ZooKeeper [INFO] Client environment:os.arch=amd64 > 2015-05-29 15:01:53 o.a.s.z.ZooKeeper [INFO] Client > environment:os.version=3.16.0-0.bpo.4-amd64 > ... -- This message was sent by Atlassian JIRA (v6.3.4#6332)