[ 
https://issues.apache.org/jira/browse/FLUME-2922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15385218#comment-15385218
 ] 

ASF subversion and git services commented on FLUME-2922:
--------------------------------------------------------

Commit 358bb670029549ed4cff192c79307fd3e4d69972 in flume's branch 
refs/heads/trunk from [~kevinconaway]
[ https://git-wip-us.apache.org/repos/asf?p=flume.git;h=358bb67 ]

FLUME-2922. Sync SequenceFile.Writer before calling hflush

This closes #52

(Kevin Conaway via Mike Percy)


> HDFSSequenceFile Should Sync Writer
> -----------------------------------
>
>                 Key: FLUME-2922
>                 URL: https://issues.apache.org/jira/browse/FLUME-2922
>             Project: Flume
>          Issue Type: Bug
>          Components: Sinks+Sources
>    Affects Versions: v1.6.0
>            Reporter: Kevin Conaway
>            Priority: Critical
>         Attachments: FLUME-2922.patch
>
>
> There is a possibility of losing data with the current HDFS sequence file 
> writer.
> Internally, the `SequenceFile.Writer` buffers data and periodically syncs it 
> to the underlying output stream.  The mechanism for doing this is dependent 
> on whether you are using compression or not but in both scenarios, the 
> key/values are appended to an internal buffer and only flushed to disk after 
> the buffer reaches a certain size.
> Thus it is quite possible for Flume to lose messages if the agent crashes, or 
> is stopped, before the internal buffer is flushed to disk.
> The correct action is to force the writer to sync its internal buffers to the 
> underlying `FSDataOutputStream` first before calling hflush/sync.
> Additionally, I believe we should be calling hsync instead of hflush.  Its my 
> understanding writes with hsync should be more durable which I believe are 
> the semantics we want here.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to