[
https://issues.apache.org/jira/browse/FLUME-2128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ted Malaska updated FLUME-2128:
-------------------------------
Attachment: FLUME-2128-2.patch
First implementation of Mike's idea.
The code can take a starting expectedCompressionRatio then with every rolling
of the file it will up date the expectedCompressionRatio. The new
expectedCompressionRatio will equal (((processSize / resulting file size) * 2)
+ lastExceptedCompressionRatio) / 3
> HDFS Sink rollSize is calculated based off of uncompressed size of cumulative
> events.
> -------------------------------------------------------------------------------------
>
> Key: FLUME-2128
> URL: https://issues.apache.org/jira/browse/FLUME-2128
> Project: Flume
> Issue Type: Bug
> Components: Sinks+Sources
> Affects Versions: v1.4.0, v1.3.1
> Reporter: Jeff Lord
> Assignee: Ted Malaska
> Labels: features
> Attachments: FLUME-2128-0.patch, FLUME-2128-1.patch,
> FLUME-2128-2.patch
>
>
> The hdfs sink rollSize parameter is compared against uncompressed event sizes.
> The net of this is that if you are using compression and expect the size of
> your files on HDFS to be rolled/sized based on the value set for rollSize
> than your files will be much smaller due to compression.
> We should take into account when compression is set and roll based on the
> compressed size on hdfs.
--
This message was sent by Atlassian JIRA
(v6.1#6144)