[jira] [Commented] (FLUME-2352) HDFSCompressedDataStream should support appendBatch

Hari Shreedharan (JIRA) Fri, 12 Sep 2014 17:22:32 -0700

    [ 
https://issues.apache.org/jira/browse/FLUME-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14132358#comment-14132358
 ]


Hari Shreedharan commented on FLUME-2352:
-----------------------------------------

This seems like a good idea. I did a quick review and it looks good. Since the 
serializer is the same for the life of the sink, we don't need to do an 
instanceOf check every time we write an event. We only need to do it once and 
reuse this info. We should fix that.

> HDFSCompressedDataStream should support appendBatch
> ---------------------------------------------------
>
>                 Key: FLUME-2352
>                 URL: https://issues.apache.org/jira/browse/FLUME-2352
>             Project: Flume
>          Issue Type: Improvement
>          Components: Sinks+Sources
>    Affects Versions: v1.5.0
>            Reporter: chenshangan
>            Assignee: chenshangan
>             Fix For: v1.6.0
>
>         Attachments: FLUME-2352.patch
>
>
> compressing events in batch is much more efficient than compressing one by 
> one.
> I set hdfs.batchSize to 200000, when I use appendBatch() in BucketWriter, the 
> append operation cost less than 1 seconds, while one by one might cost 10 
> seconds.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLUME-2352) HDFSCompressedDataStream should support appendBatch

Reply via email to