[ 
https://issues.apache.org/jira/browse/FLUME-1301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Percy updated FLUME-1301:
------------------------------

    Attachment: FLUME-1301-2.patch

Attaching patch which appears to provide the correct durability guarantees for 
compressed output. I've also added a unit test.

There is one weird issue with this change, which is that it requires writing 
what are essentially concatenated gzip files. As it turns out, the pure-java 
implementation of GzipCodec in Hadoop has a bug - HADOOP-8522 - however the 
Native library seems to work fine with this change.

Another issue is that it's hard to write unit tests for concatenated gzip files 
in Java since the stock JDK 1.6 GzipInputStream doesn't provide support for 
them. So I've had to rely on manual verification with "gunzip" to ensure that 
these fixes work with gzip compression using the native library.
                
> HDFSCompressedDataStream can lose data
> --------------------------------------
>
>                 Key: FLUME-1301
>                 URL: https://issues.apache.org/jira/browse/FLUME-1301
>             Project: Flume
>          Issue Type: Bug
>          Components: Sinks+Sources
>    Affects Versions: v1.1.0
>            Reporter: Mike Percy
>            Assignee: Mike Percy
>            Priority: Blocker
>             Fix For: v1.2.0
>
>         Attachments: FLUME-1301-2.patch
>
>
> HDFSCompressedDataStream currently uses flush() to flush the compressed 
> streams. Unfortunately, in at least Snappy and GZip, those operations are 
> no-ops. So we have to go back to using finish() in order to ensure the 
> durability of writes at Transaction boundaries.
> In addition to finish(), resetState() must be called.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to