[ 
https://issues.apache.org/jira/browse/STORM-837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14659568#comment-14659568
 ] 

ASF GitHub Bot commented on STORM-837:
--------------------------------------

Github user arunmahadevan commented on the pull request:

    https://github.com/apache/storm/pull/644#issuecomment-128266209
  
    I was trying to put a safety net  so that the recovery would always work. 
The limitation on the timed rotation policy is also on similar lines that we 
don't know how much data would be written to the files before the timed 
rotation kicks in.
    
    So,
    
    - For file size rotation, remove the restrictions on the file size and just 
log warnings.
    - For time based rotation, set a flag and do the actual rotation in 
`doCommit` as you suggested.
    - Add a note in the README about the risk that the recovery would fail if 
files cannot be recovered within timeout and hence should be kept to reasonable 
sizes (and time interval) or the message timeout should be increased.
    
    Does it sound reasonable ? If we agree, will make the changes.
    
    



> HdfsState ignores commits
> -------------------------
>
>                 Key: STORM-837
>                 URL: https://issues.apache.org/jira/browse/STORM-837
>             Project: Apache Storm
>          Issue Type: Bug
>            Reporter: Robert Joseph Evans
>            Assignee: Arun Mahadevan
>            Priority: Critical
>
> HdfsState works with trident which is supposed to provide exactly once 
> processing.  It does this two ways, first by informing the state about 
> commits so it can be sure the data is written out, and second by having a 
> commit id, so that double commits can be handled.
> HdfsState ignores the beginCommit and commit calls, and with that ignores the 
> ids.  This means that if you use HdfsState and your worker crashes you may 
> both lose data and get some data twice.
> At a minimum the flush and file rotation should be tied to the commit in some 
> way.  The commit ID should at a minimum be written out with the data so 
> someone reading the data can have a hope of deduping it themselves.
> Also with the rotationActions it is possible for a file that was partially 
> written is leaked, and never moved to the final location, because it is not 
> rotated.  I personally think the actions are too generic for this case and 
> need to be deprecated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to