[jira] [Commented] (STORM-969) HDFS Bolt can end up in an unrecoverable state

ASF GitHub Bot (JIRA) Tue, 01 Sep 2015 14:58:16 -0700

    [ 
https://issues.apache.org/jira/browse/STORM-969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14726276#comment-14726276
 ]


ASF GitHub Bot commented on STORM-969:
--------------------------------------

Github user dossett commented on the pull request:

    https://github.com/apache/storm/pull/664#issuecomment-136876170
  
    @arunmahadevan Our approach is to set the tick tuple frequency to be half 
of the message timeout setting for the topology.  Can the bolt get access to 
that topology setting? prepare() passes a TopologyContext to the bolt but I 
don't see a way to get configuration information out of it.  However, I am not 
that familiar with TopologyContext.
    
    If that's not possible, the bolt could just set a flush frequency of 15 
seconds since the storm timeout default is 30 seconds and assume that anyone 
changing the timeout setting should also be adjusting the flush frequency as 
well


> HDFS Bolt can end up in an unrecoverable state
> ----------------------------------------------
>
>                 Key: STORM-969
>                 URL: https://issues.apache.org/jira/browse/STORM-969
>             Project: Apache Storm
>          Issue Type: Improvement
>          Components: storm-hdfs
>            Reporter: Aaron Dossett
>            Assignee: Aaron Dossett
>
> The body of the HDFSBolt.execute() method is essentially one try-catch block. 
>  The catch block reports the error and fails the current tuple.  In some 
> cases the bolt's FSDataOutputStream object (named 'out') is in an 
> unrecoverable state and no subsequent calls to execute() can succeed.
> To produce this scenario:
> - process some tuples through HDFS bolt
> - put the underlying HDFS system into safemode
> - process some more tuples and receive a correct ClosedChannelException
> - take the underlying HDFS system out of safemode
> - subsequent tuples continue to fail with the same exception
> The three fundamental operations that execute takes (writing, sync'ing, 
> rotating) need to be isolated so that errors from each are specifically 
> handled.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (STORM-969) HDFS Bolt can end up in an unrecoverable state

Reply via email to