[jira] [Commented] (FLUME-3439) Introduce SynchronousChannel, a fast disk-less channel that doesn't lose events

Ralph Goers (Jira) Tue, 04 Oct 2022 19:40:08 -0700


    [ 
https://issues.apache.org/jira/browse/FLUME-3439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17612837#comment-17612837
 ]


Ralph Goers commented on FLUME-3439:
------------------------------------

I just looked at this and I think there are some problems with the description. 
# It isn't possible for a Channel to be synchronous. The Source and Sink run on 
different threads. The Channel only waits until events in the putList have been 
consumed. That doesn't imply they were written.
# The only difference I see between this and MemoryChannel is that the 
MemoryChannel requires a limit on the number of events that can be queued 
whereas this does not. So if the sink fails for some reason Flume could 
encounter an OutOfMemoryError using this channel whereas MemoryChannel will 
refuse to accept new events.

Based on this I don't see how you can claim it doesn't lose events.

> Introduce SynchronousChannel, a fast disk-less channel that doesn't lose 
> events
> -------------------------------------------------------------------------------
>
>                 Key: FLUME-3439
>                 URL: https://issues.apache.org/jira/browse/FLUME-3439
>             Project: Flume
>          Issue Type: Improvement
>          Components: Channel
>            Reporter: Eiichi Sato
>            Priority: Major
>
> Recently, I implemented 
> [SynchronousChannel|https://github.com/eiiches/flume-synchronous-channel], in 
> which every transaction that puts events waits for corresponding transactions 
> that take the events to complete.
>  * It's fast because it doesn't use disks.
>  * It doesn't lose events because it doesn't actually store events. It has no 
> capacity.
> Motivation behind this channel is that, when using a Taildir Source to 
> collect logs and sending them to a remote Flume instance, we typically use 
> File Channel or Memory Channel. Memory Channel is fast, but could lose 
> events. File Channel is durable, but slow. Using a File Channel also means we 
> are writing the same contents twice on the disk: first for a log file that 
> Taildir Source is watching and secondly for the channel data. We don't need 
> to buffer events in a channel because events are already there in a log file 
> and Taildir Source can just read at its own pace.
> Expected use cases are:
>  * Taildir Source --> Synchronous Channel --> Avro Sink
>  * Kinesis Source --> Synchronous Channel --> Avro Sink
>  * Cloud Pub/Sub Source --> Synchronous Channel --> Avro Sink
> In all these cases, the channel doesn't need to buffer events because the 
> source already works like a buffer.
> In [this 
> benchmark|https://github.com/eiiches/flume-synchronous-channel/tree/main/docs/benchmark]
>  that uses Taildir Source + Synchronous Channel, I observed 84% increase in 
> throughput and 75-81% reduction in CPU usage compared to File Channel when 
> event body is 512-byte.
>  
> ----
>  
> The code is around 220 LOC (excluding tests) and doesn't pull additional 
> third-party dependencies.
> I can work on a PR, but before doing so, I want a general feedback from the 
> community. I'm wondering if this channel is useful or generic enough to be 
> included in Flume or if this should be kept in a separate repository. What do 
> you think?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@flume.apache.org
For additional commands, e-mail: issues-h...@flume.apache.org

[jira] [Commented] (FLUME-3439) Introduce SynchronousChannel, a fast disk-less channel that doesn't lose events

Reply via email to