[jira] [Commented] (STORM-837) HdfsState ignores commits

ASF GitHub Bot (JIRA) Thu, 06 Aug 2015 06:41:56 -0700

    [ 
https://issues.apache.org/jira/browse/STORM-837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14660014#comment-14660014
 ]


ASF GitHub Bot commented on STORM-837:
--------------------------------------

Github user revans2 commented on the pull request:

    https://github.com/apache/storm/pull/644#issuecomment-128368560
  
    I am fine with those changes in principal.  Although I do want to spend 
some time reading through the code to convince myself that there are no corner 
cases that we are missing.
    
    Also I would love to see some unit tests to show we can recover.  I believe 
that hsync works on the local file system so you would not need to bring up an 
HDFS mini cluster, just create a file system using ```file:///...``` as the URL 
and write there. 


> HdfsState ignores commits
> -------------------------
>
>                 Key: STORM-837
>                 URL: https://issues.apache.org/jira/browse/STORM-837
>             Project: Apache Storm
>          Issue Type: Bug
>            Reporter: Robert Joseph Evans
>            Assignee: Arun Mahadevan
>            Priority: Critical
>
> HdfsState works with trident which is supposed to provide exactly once 
> processing.  It does this two ways, first by informing the state about 
> commits so it can be sure the data is written out, and second by having a 
> commit id, so that double commits can be handled.
> HdfsState ignores the beginCommit and commit calls, and with that ignores the 
> ids.  This means that if you use HdfsState and your worker crashes you may 
> both lose data and get some data twice.
> At a minimum the flush and file rotation should be tied to the commit in some 
> way.  The commit ID should at a minimum be written out with the data so 
> someone reading the data can have a hope of deduping it themselves.
> Also with the rotationActions it is possible for a file that was partially 
> written is leaked, and never moved to the final location, because it is not 
> rotated.  I personally think the actions are too generic for this case and 
> need to be deprecated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (STORM-837) HdfsState ignores commits

Reply via email to