Hello, dear Flume devs, I'm currently experimenting with using flume's embedded agent to relay ~10MB/s worth of events to another set of flume agents. There won't be much memory available, so the SpillableMemoryChannel looked like a good alternative, however:
* This agent needs to receive the events in a "fire and forget" approach so that it doesn't impact the application's performance. When the SpillableMemoryChannel starts spilling to disk, the performance is significantly impaired due to the synchronous I/O calls. * If the channel becomes full, I'd like to discard older events instead of rejecting new ones, as the current events are more important in this case (application/usage metrics). * When the SpillableMemoryChannel contains a lot of data on disk, it takes several minutes to become available after a restart, preventing Flume (and in this case the application that is embedding it as well) from accepting events during this period. With that in mind, I started writing a new channel which is basically a MemoryChannel with a background thread that starts moving events into a FileChannel before the MemoryChannel becomes full. It also starts the FileChannel in background, so Flume can begin accepting events into its MemoryChannel immediately. Moreover, it can discard older events from the FileChannel if necessary to accommodate new events spilling from the MemoryChannel. The code can be found here: Project: https://github.com/lgvier/flume-async-spillable-mem-channel Class: https://github.com/lgvier/flume-async-spillable-mem-channel/blob/master/src/main/java/org/apache/flume/channel/AsyncSpillableMemoryChannel.java I'd appreciate your input on it very much. Does it seem like a good approach? Would there be a better solution for this scenario? I'm also considering using the Flume RPC client with third-party queueing mechanisms, but would prefer an end-to-end flume solution. Is it useful for the Flume community? Thank you, -Geovani
