James Xu created STORM-143:
------------------------------
Summary: Launching a process throws away standard out; can hang
Key: STORM-143
URL: https://issues.apache.org/jira/browse/STORM-143
Project: Apache Storm (Incubating)
Issue Type: Bug
Reporter: James Xu
Priority: Minor
https://github.com/nathanmarz/storm/issues/489
https://github.com/nathanmarz/storm/blob/master/src/clj/backtype/storm/util.clj#L349
When we launch a process, standard out is written to a system buffer and does
not appear to be read. Also, nothing is redirected to standard in. This can
have the following effects:
A worker can hang when initializing (e.g. UnsatisfiedLinkError looking for
jzmq), and it will be unable to communicate the error as standard out is being
swallowed.
A process that writes too much to standard out will block if the buffer fills
A process that tries to read form standard in for any reason will block.
Perhaps we can redirect standard out to an .out file, and redirect /dev/null to
the standard in stream of the process?
----------
nathanmarz: Storm redirects stdout to the logging system. It's worked fine for
us in our topologies.
----------
d2r: We see in worker.clj, in mk-worker, where there is a call to
redirect-stdio-to-slf4j!. This would not seem to help in cases such as we are
seeing when there is a problem launching the worker itself.
(defn -main [storm-id assignment-id port-str worker-id]
(let [conf1 (read-storm-config)
login_conf_file (System/getProperty "java.security.auth.login.config")
conf (if login_conf_file (merge conf1
{"java.security.auth.login.config" login_conf_file}) conf1)]
(validate-distributed-mode! conf)
(mk-worker conf nil (java.net.URLDecoder/decode storm-id) assignment-id
(Integer/parseInt port-str) worker-id)))
If anything were to go wrong (CLASSPATH, jvm opts, misconfiguration...) before
-main or before mk-worker, then any output would be lost. The symptom we saw
was that the topology sat around apparently doing nothing, yet there was no log
indicating that the workers were failing to start.
Is there other redirection to logs that I'm missing?
----------
xiaokang: we use bash to launch worker process and redirect its stdout to
woker-port.out file. it heleped us find the zeromq jni problem that cause the
jvm crash without any log.
----------
nathanmarz: @d2r Yea, that's all I was referring to. If we redirect stdout,
will the code that redirects stdout to the logging system still take effect?
This is important because we can control the size of the logfiles (via the
logback config) but not the size of the redirected stdout file.
----------
d2r: My hunch is that it will work as it does now, except that any messages
that are getting thrown away before that point would go to a file instead. I
can play with it and find out. We wouldn't want to change the redirection, just
restore visibility to any output that might occur prior to the redirection.
There should be some safety valve to control the size of any new .out in case
something goes berserk.
@xiaokang I see how that would work. We also need to make sure redirection
continues to work as it currently does for the above reason.
----------
xiaokang: @d2r @nathanmarz In out cluster, storm's stdout redirection still
works for any System.out output while JNI errors goes to worker-port.out file.
I think it will be nice to use the same worker-port.log file for bash stdout
redirection since logback can control log file size. But it is a little bit
ugly to use bash to launch worker java process.
--
This message was sent by Atlassian JIRA
(v6.1.4#6159)