Craig Hawco created STORM-2194:
----------------------------------
Summary: ReportErrorAndDie doesn't always die
Key: STORM-2194
URL: https://issues.apache.org/jira/browse/STORM-2194
Project: Apache Storm
Issue Type: Bug
Components: storm-core
Affects Versions: 2.0.0, 1.0.2
Reporter: Craig Hawco
I've been trying to track down a cause of some of our issues with some
exceptions leaving Storm workers in a zombified state for some time. I believe
I've isolated the bug to the behaviour in
`:report-error-and-die`/reportErrorAndDie in the executor. Essentially:
{code}
:report-error-and-die (fn [error]
(try
((:report-error <>) error)
(catch Exception e
(log-message "Error while reporting error to
cluster, proceeding with shutdown")))
(if (or
(exception-cause? InterruptedException
error)
(exception-cause?
java.io.InterruptedIOException error))
(log-message "Got interrupted excpetion shutting
thread down...")
((:suicide-fn <>))))
{code}
has the grouping for the if statement slightly wrong. It shouldn't log OR die
from InterruptedException/InterruptedIOException, but it should log under that
condition, and ALWAYS die.
Basically:
{code}
:report-error-and-die (fn [error]
(try
((:report-error <>) error)
(catch Exception e
(log-message "Error while reporting error to
cluster, proceeding with shutdown")))
(if (or
(exception-cause? InterruptedException
error)
(exception-cause?
java.io.InterruptedIOException error))
(log-message "Got interrupted excpetion shutting
thread down..."))
((:suicide-fn <>)))
{code}
After digging into the Java port of this code, it looks like a different bug
was introduced while porting:
{code}
if (Utils.exceptionCauseIsInstanceOf(InterruptedException.class, e)
||
Utils.exceptionCauseIsInstanceOf(java.io.InterruptedIOException.class, e)) {
LOG.info("Got interrupted exception shutting thread down...");
suicideFn.run();
}
{code}
Was how this was initially ported, and STORM-2142 changed this to:
{code}
if (Utils.exceptionCauseIsInstanceOf(InterruptedException.class, e)
||
Utils.exceptionCauseIsInstanceOf(java.io.InterruptedIOException.class, e)) {
LOG.info("Got interrupted exception shutting thread down...");
} else {
suicideFn.run();
}
}
{code}
However, I believe the correct port is as described above:
{code}
if (Utils.exceptionCauseIsInstanceOf(InterruptedException.class, e)
||
Utils.exceptionCauseIsInstanceOf(java.io.InterruptedIOException.class, e)) {
LOG.info("Got interrupted exception shutting thread down...");
}
suicideFn.run();
}
{code}
I'll look into providing patches for the 1.x and 2.x branches shortly.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)