[
https://issues.apache.org/jira/browse/STORM-695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14340333#comment-14340333
]
Michael Noll commented on STORM-695:
------------------------------------
To fix the storm CLI script: Essentially we must updated the
{{exec_storm_class()}} function to understand and pass exit codes of the
commands it ran back to the caller.
Idea:
{code}
$ diff -u storm storm.fixed
--- storm 2015-02-24 13:59:03.000000000 +0000
+++ storm.fixed 2015-02-27 15:47:38.373536529 +0000
@@ -179,11 +179,13 @@
"-cp", get_classpath(extrajars),
] + jvmopts + [klass] + list(args)
print("Running: " + " ".join(all_args))
+ return_code = 1 # default return code indicates failure
if fork:
- os.spawnvp(os.P_WAIT, JAVA_CMD, all_args)
+ return_code = os.spawnvp(os.P_WAIT, JAVA_CMD, all_args)
else:
# handling whitespaces in JAVA_CMD
- sub.call(all_args)
+ return_code = sub.call(all_args)
+ return return_code
def jar(jarfile, klass, *args):
"""Syntax: [storm jar topology-jar-path class ...]
@@ -490,7 +492,7 @@
def unknown_command(*args):
print("Unknown command: [storm %s]" % ' '.join(sys.argv[1:]))
print_usage()
- sys.exit(254)
+ return 254
COMMANDS = {"jar": jar, "kill": kill, "shell": shell, "nimbus": nimbus, "ui":
ui, "logviewer": logviewer,
"drpc": drpc, "supervisor": supervisor, "localconfvalue":
print_localconfvalue,
@@ -532,7 +534,8 @@
parse_config(config_list)
COMMAND = args[0]
ARGS = args[1:]
- (COMMANDS.get(COMMAND, unknown_command))(*ARGS)
+ return_code = (COMMANDS.get(COMMAND, unknown_command))(*ARGS)
+ sys.exit(return_code)
if __name__ == "__main__":
main()
{code}
As I said above this diff/patch alone is not sufficient because the Storm
commands themselves (e.g. kill_topology.clj) are not returning proper
success/failure information.
> storm CLI tool reports zero exit code on error scenario, take 2
> ---------------------------------------------------------------
>
> Key: STORM-695
> URL: https://issues.apache.org/jira/browse/STORM-695
> Project: Apache Storm
> Issue Type: Bug
> Affects Versions: 0.9.3
> Reporter: Michael Noll
> Priority: Minor
>
> Commands such as "storm kill non-existing-topology" will return an exit code
> of zero, indicating success when in fact the command failed.
> h3. How to reproduce
> Here is but one example where the {{storm}} CLI tool violates shell best
> practices:
> {code}
> # Let's kill a topology that is in fact not running in the cluster.
> $ storm kill i-do-not-exist-topo
> <snip>
> Exception in thread "main" NotAliveException(msg:i-do-not-exist-topo is not
> alive)
> # Print the exit code of last command.
> $ echo $?
> 0 # <<< but since the kill command failed this should be non-zero!
> {code}
> Another example is the "storm jar" command. If you attempt to submit a
> topology that has the same name as an existing, running topology, the "storm
> jar" command will not submit the topology -- instead it will print an
> exception (think: "the topology FooName is already running"), which is ok,
> but it will then exit with a return code of zero, which indicates success
> (which is wrong).
> h3. Impact
> This bug prevents automated deployment tools such as Ansible or Puppet as
> well as ad-hoc CLI scripting (think: fire-fighting Ops teams) to work
> properly because Storm violates shell conventions by not returning non-zero
> exit codes in case of failures.
> h3. How to fix
> From what I understand the solution is two-fold:
> # The various Storm commands that are being called by the {{storm}} script
> must return proper exit codes.
> # The {{storm}} script must store these exit codes and return itself with the
> respective exit code of the Storm command it actually ran.
> For example, here's the current code that implements the "storm kill" command:
> {code}
> # In: bin/storm
> def kill(*args):
> """Syntax: [storm kill topology-name [-w wait-time-secs]]
> Kills the topology with the name topology-name. Storm will
> first deactivate the topology's spouts for the duration of
> the topology's message timeout to allow all messages currently
> being processed to finish processing. Storm will then shutdown
> the workers and clean up their state. You can override the length
> of time Storm waits between deactivation and shutdown with the -w flag.
> """
> exec_storm_class(
> "backtype.storm.command.kill_topology",
> args=args,
> jvmtype="-client",
> extrajars=[USER_CONF_DIR, STORM_BIN_DIR])
> {code}
> which in turn calls the following code in {{kill_topology.clj}}:
> {code}
> ;; In: backtype.storm.command.kill-topology
> (ns backtype.storm.command.kill-topology
> (:use [clojure.tools.cli :only [cli]])
> (:use [backtype.storm thrift config log])
> (:import [backtype.storm.generated KillOptions])
> (:gen-class))
> (defn -main [& args]
> (let [[{wait :wait} [name] _] (cli args ["-w" "--wait" :default nil
> :parse-fn #(Integer/parseInt %)])
> opts (KillOptions.)]
> (if wait (.set_wait_secs opts wait))
> (with-configured-nimbus-connection nimbus
> (.killTopologyWithOpts nimbus name opts)
> (log-message "Killed topology: " name)
> )))
> {code}
> which in turn calls the following code in {{nimbus.clj}}:
> {code}
> ;; In: backtype.storm.daemon.nimbus
> (^void killTopologyWithOpts [this ^String storm-name ^KillOptions
> options]
> (check-storm-active! nimbus storm-name true)
> (let [topology-conf (try-read-storm-conf-from-name conf storm-name
> nimbus)]
> (check-authorization! nimbus storm-name topology-conf
> "killTopology"))
> (let [wait-amt (if (.is_set_wait_secs options)
> (.get_wait_secs options)
> )]
> (transition-name! nimbus storm-name [:kill wait-amt] true)
> ))
> {code}
> As you can see the current implementation does not pass success/failure
> information back to the caller.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)