it may be better to make this async behavior a policy of the spillable channel rather than introduce a new channel. -roshan
On Mon, Sep 8, 2014 at 7:50 AM, Luiz Geovani Vier <[email protected]> wrote: > Hello, dear Flume devs, > > I'm currently experimenting with using flume's embedded agent to relay > ~10MB/s worth of events to another set of flume agents. > There won't be much memory available, so the SpillableMemoryChannel looked > like a good alternative, however: > > * This agent needs to receive the events in a "fire and forget" approach so > that it doesn't impact the application's performance. > When the SpillableMemoryChannel starts spilling to disk, the performance is > significantly impaired due to the synchronous I/O calls. > * If the channel becomes full, I'd like to discard older events instead of > rejecting new ones, as the current events are more important in this case > (application/usage metrics). > * When the SpillableMemoryChannel contains a lot of data on disk, it takes > several minutes to become available after a restart, preventing Flume (and > in this case the application that is embedding it as well) from accepting > events during this period. > > With that in mind, I started writing a new channel which is basically a > MemoryChannel with a background thread that starts moving events into a > FileChannel before the MemoryChannel becomes full. > It also starts the FileChannel in background, so Flume can begin accepting > events into its MemoryChannel immediately. > Moreover, it can discard older events from the FileChannel if necessary to > accommodate new events spilling from the MemoryChannel. > > The code can be found here: > Project: https://github.com/lgvier/flume-async-spillable-mem-channel > Class: > > https://github.com/lgvier/flume-async-spillable-mem-channel/blob/master/src/main/java/org/apache/flume/channel/AsyncSpillableMemoryChannel.java > > I'd appreciate your input on it very much. > Does it seem like a good approach? > Would there be a better solution for this scenario? I'm also considering > using the Flume RPC client with third-party queueing mechanisms, but would > prefer an end-to-end flume solution. > Is it useful for the Flume community? > > Thank you, > -Geovani > -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
