[jira] [Created] (STORM-2667) Exception Handling in the AbstractHdfsBolt causes bolt to restart

Robert Joseph Evans (JIRA) Tue, 01 Aug 2017 08:05:10 -0700

Robert Joseph Evans created STORM-2667:
------------------------------------------


             Summary: Exception Handling in the AbstractHdfsBolt causes bolt to 
restart
                 Key: STORM-2667
                 URL: https://issues.apache.org/jira/browse/STORM-2667
             Project: Apache Storm
          Issue Type: Bug
          Components: storm-hdfs
            Reporter: Robert Joseph Evans
            Priority: Minor


Recently while reviewing the HDFSBolt code because of a question on the mailing 
list, I noticed that the abstract bolt will fail a tuple if an IOException is 
thrown while trying to write it out, and then force a sync in those cases.

https://github.com/apache/storm/blob/64e29f365c9b5d3e15b33f33ab64e200345333e4/external/storm-hdfs/src/main/java/org/apache/storm/hdfs/bolt/AbstractHdfsBolt.java#L150-L160

A RuntimeException thrown by the formatter, on the other hand, bubbles up and 
forces the the worker to restart. 

Any IOException thrown by an Hdfs Output Stream means at that point the stream 
is closed and cannot be used any more.  As part of our recovery we will try to 
sync, but this will also fail because the stream is closed by the exception 
that was thrown, and will result in the sync failing an a RuntimeException 
being thrown, and the entire worker being restarted.

The current code "works" and eventually will recover from these issues, but it 
may take a while.  It also means that we are likely to have more data loss than 
needed for some output formats.

I would suggest that we try to recover from RuntimeExceptions in the same say 
that we are trying to recover from IOExceptions now.  I also would suggest that 
we handle the special case where the {{tupleBatch.size()}} is 0 but we got an 
IOException from the writer, as the forceSync will not happen so tuples will 
continue to fail until the sync policy decides to sync, at which point the 
worker will crash and then recover.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Created] (STORM-2667) Exception Handling in the AbstractHdfsBolt causes bolt to restart

Reply via email to