[ 
https://issues.apache.org/jira/browse/FLUME-2128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13772139#comment-13772139
 ] 

Hari Shreedharan commented on FLUME-2128:
-----------------------------------------

To get this working on HDFSCompressedStream is a bit tricky. In 
HDFSCompressedDataStream class there is a member FSDataOutputStream, fsOut. 
This class exposes a method getPos() which returns the next write position, 
which is basically the length of the class. You can use this to get the length 
and decide when to roll. You might want to test it out, but looks like it 
should work.
                
> HDFS Sink rollSize is calculated based off of uncompressed size of cumulative 
> events.
> -------------------------------------------------------------------------------------
>
>                 Key: FLUME-2128
>                 URL: https://issues.apache.org/jira/browse/FLUME-2128
>             Project: Flume
>          Issue Type: Bug
>          Components: Sinks+Sources
>    Affects Versions: v1.4.0, v1.3.1
>            Reporter: Jeff Lord
>            Assignee: Ted Malaska
>              Labels: features
>         Attachments: FLUME-2128-0.patch
>
>
> The hdfs sink rollSize parameter is compared against uncompressed event sizes.
> The net of this is that if you are using compression and expect the size of 
> your files on HDFS to be rolled/sized based on the value set for rollSize 
> than your files will be much smaller due to compression.
> We should take into account when compression is set and roll based on the 
> compressed size on hdfs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to