[ https://issues.apache.org/jira/browse/FLUME-3439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17613083#comment-17613083 ]
Ralph Goers commented on FLUME-3439: ------------------------------------ OK. I do see that the put commit blocks until the take commit is done. While I see now that it does function as you describe I can envision many use cases in which its use would not be suitable as this essentially just makes Flume a synchronous proxy between the originator of the data and the destination. > Introduce SynchronousChannel, a fast disk-less channel that doesn't lose > events > ------------------------------------------------------------------------------- > > Key: FLUME-3439 > URL: https://issues.apache.org/jira/browse/FLUME-3439 > Project: Flume > Issue Type: Improvement > Components: Channel > Reporter: Eiichi Sato > Priority: Major > > Recently, I implemented > [SynchronousChannel|https://github.com/eiiches/flume-synchronous-channel], in > which every transaction that puts events waits for corresponding transactions > that take the events to complete. > * It's fast because it doesn't use disks. > * It doesn't lose events because it doesn't actually store events. It has no > capacity. > Motivation behind this channel is that, when using a Taildir Source to > collect logs and sending them to a remote Flume instance, we typically use > File Channel or Memory Channel. Memory Channel is fast, but could lose > events. File Channel is durable, but slow. Using a File Channel also means we > are writing the same contents twice on the disk: first for a log file that > Taildir Source is watching and secondly for the channel data. We don't need > to buffer events in a channel because events are already there in a log file > and Taildir Source can just read at its own pace. > Expected use cases are: > * Taildir Source --> Synchronous Channel --> Avro Sink > * Kinesis Source --> Synchronous Channel --> Avro Sink > * Cloud Pub/Sub Source --> Synchronous Channel --> Avro Sink > In all these cases, the channel doesn't need to buffer events because the > source already works like a buffer. > In [this > benchmark|https://github.com/eiiches/flume-synchronous-channel/tree/main/docs/benchmark] > that uses Taildir Source + Synchronous Channel, I observed 84% increase in > throughput and 75-81% reduction in CPU usage compared to File Channel when > event body is 512-byte. > > ---- > > The code is around 220 LOC (excluding tests) and doesn't pull additional > third-party dependencies. > I can work on a PR, but before doing so, I want a general feedback from the > community. I'm wondering if this channel is useful or generic enough to be > included in Flume or if this should be kept in a separate repository. What do > you think? -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@flume.apache.org For additional commands, e-mail: issues-h...@flume.apache.org