[jira] [Commented] (FLUME-3268) Introducing micro batch processing to HDFSEventSink

zhenzhao wang (JIRA) Tue, 14 Aug 2018 17:51:26 -0700


    [ 
https://issues.apache.org/jira/browse/FLUME-3268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16580590#comment-16580590
 ]


zhenzhao wang commented on FLUME-3268:
--------------------------------------

[~fszabo] hdfs.batchSize is used for flush frequency. It will call append rpc 
for each event . What we are going to do is batching events for HDFS append. 
There're some problem with current pull request while I try to refactor the 
code, will let you known it's good for reviewing.

> Introducing micro batch processing to HDFSEventSink
> ---------------------------------------------------
>
>                 Key: FLUME-3268
>                 URL: https://issues.apache.org/jira/browse/FLUME-3268
>             Project: Flume
>          Issue Type: New Feature
>            Reporter: zhenzhao wang
>            Priority: Major
>         Attachments: FLUME-3268-0.patch
>
>
> In our test with HDFSEvent sink, we found that we could increase the draining 
> speed of HDFSSink up to 4x by introducing micro batch processing. With the 
> micro batch processing feature, we will batch the events written to HDFS 
> instead of one by one.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (FLUME-3268) Introducing micro batch processing to HDFSEventSink

Reply via email to