Robert Joseph Evans created STORM-2667:
------------------------------------------
Summary: Exception Handling in the AbstractHdfsBolt causes bolt to
restart
Key: STORM-2667
URL: https://issues.apache.org/jira/browse/STORM-2667
Project: Apache Storm
Issue Type: Bug
Components: storm-hdfs
Reporter: Robert Joseph Evans
Priority: Minor
Recently while reviewing the HDFSBolt code because of a question on the mailing
list, I noticed that the abstract bolt will fail a tuple if an IOException is
thrown while trying to write it out, and then force a sync in those cases.
https://github.com/apache/storm/blob/64e29f365c9b5d3e15b33f33ab64e200345333e4/external/storm-hdfs/src/main/java/org/apache/storm/hdfs/bolt/AbstractHdfsBolt.java#L150-L160
A RuntimeException thrown by the formatter, on the other hand, bubbles up and
forces the the worker to restart.
Any IOException thrown by an Hdfs Output Stream means at that point the stream
is closed and cannot be used any more. As part of our recovery we will try to
sync, but this will also fail because the stream is closed by the exception
that was thrown, and will result in the sync failing an a RuntimeException
being thrown, and the entire worker being restarted.
The current code "works" and eventually will recover from these issues, but it
may take a while. It also means that we are likely to have more data loss than
needed for some output formats.
I would suggest that we try to recover from RuntimeExceptions in the same say
that we are trying to recover from IOExceptions now. I also would suggest that
we handle the special case where the {{tupleBatch.size()}} is 0 but we got an
IOException from the writer, as the forceSync will not happen so tuples will
continue to fail until the sync policy decides to sync, at which point the
worker will crash and then recover.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)