[ 
https://issues.apache.org/jira/browse/FLUME-1361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13410602#comment-13410602
 ] 

Juhani Connolly commented on FLUME-1361:
----------------------------------------

With a setup up of:

Exec source tailing tomcat logs
Sending to file channel
Which is drained by an avro sink

With the current implementation of FileChannel, and a single disk(so 
checkpoint/data dirs both on the same disk) we were getting only 10 events/sec 
throughput. What I have gathered from other discussions and my own assumptions 
that follow from them(please correct me if this is wrong) is that this is 
because commits trigger an fsync, which then triggers at least 2 seeks(one for 
the data dir, one for the checkpoint dir) + seeks for everything else recently 
written to disk(e.g. tomcat logs). On a system with 2-3 exclusive disks 
dedicated to flume, the writes would be sequential and probably not a problem.

With this patch, we were getting full throughput of our live logs(amounting to 
650ish events per second per server). I have yet to test what the maximum is, 
but regardless, it solves what I believe will be a very common use case(tailing 
exec source to file channel)

Apparently the review requests no longer get auto-linked... added a link to the 
review request... I'll fix up the docs tomorrow once I get back to my work 
computer
                
> Add event batching to ExecSource
> --------------------------------
>
>                 Key: FLUME-1361
>                 URL: https://issues.apache.org/jira/browse/FLUME-1361
>             Project: Flume
>          Issue Type: Improvement
>            Reporter: Juhani Connolly
>            Assignee: Juhani Connolly
>
> Add a configuration option for the number of items to send to the channel in 
> a single transaction.
> This will help a lot with FileChannel which needs to fsync every commit.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to