[jira] [Commented] (FLUME-3439) Introduce SynchronousChannel, a fast disk-less channel that doesn't lose events

Ralph Goers (Jira) Wed, 05 Oct 2022 10:24:35 -0700


    [ 
https://issues.apache.org/jira/browse/FLUME-3439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17613083#comment-17613083
 ]


Ralph Goers commented on FLUME-3439:
------------------------------------

OK. I do see that the put commit blocks until the take commit is done. 

While I see now that it does function as you describe I can envision many use 
cases in which its use would not be suitable as this essentially just makes 
Flume a synchronous proxy between the originator of the data and the 
destination.



> Introduce SynchronousChannel, a fast disk-less channel that doesn't lose 
> events
> -------------------------------------------------------------------------------
>
>                 Key: FLUME-3439
>                 URL: https://issues.apache.org/jira/browse/FLUME-3439
>             Project: Flume
>          Issue Type: Improvement
>          Components: Channel
>            Reporter: Eiichi Sato
>            Priority: Major
>
> Recently, I implemented 
> [SynchronousChannel|https://github.com/eiiches/flume-synchronous-channel], in 
> which every transaction that puts events waits for corresponding transactions 
> that take the events to complete.
>  * It's fast because it doesn't use disks.
>  * It doesn't lose events because it doesn't actually store events. It has no 
> capacity.
> Motivation behind this channel is that, when using a Taildir Source to 
> collect logs and sending them to a remote Flume instance, we typically use 
> File Channel or Memory Channel. Memory Channel is fast, but could lose 
> events. File Channel is durable, but slow. Using a File Channel also means we 
> are writing the same contents twice on the disk: first for a log file that 
> Taildir Source is watching and secondly for the channel data. We don't need 
> to buffer events in a channel because events are already there in a log file 
> and Taildir Source can just read at its own pace.
> Expected use cases are:
>  * Taildir Source --> Synchronous Channel --> Avro Sink
>  * Kinesis Source --> Synchronous Channel --> Avro Sink
>  * Cloud Pub/Sub Source --> Synchronous Channel --> Avro Sink
> In all these cases, the channel doesn't need to buffer events because the 
> source already works like a buffer.
> In [this 
> benchmark|https://github.com/eiiches/flume-synchronous-channel/tree/main/docs/benchmark]
>  that uses Taildir Source + Synchronous Channel, I observed 84% increase in 
> throughput and 75-81% reduction in CPU usage compared to File Channel when 
> event body is 512-byte.
>  
> ----
>  
> The code is around 220 LOC (excluding tests) and doesn't pull additional 
> third-party dependencies.
> I can work on a PR, but before doing so, I want a general feedback from the 
> community. I'm wondering if this channel is useful or generic enough to be 
> included in Flume or if this should be kept in a separate repository. What do 
> you think?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@flume.apache.org
For additional commands, e-mail: issues-h...@flume.apache.org

[jira] [Commented] (FLUME-3439) Introduce SynchronousChannel, a fast disk-less channel that doesn't lose events

Reply via email to