Joe, thanks very much for reporting these issues. I believe RecoverableMemoryChannel is slated to be removed from Flume, since FileChannel is faster and has more solid reliability guarantees. I thought there was a JIRA filed by Brock to do that removal but I don't seem to be able to find the ticket number right now.
Mike On Wednesday, May 23, 2012 at 8:12 AM, Joe Crobak wrote: > I was investigating some failure scenarios with the RecoverableMemoryChannel > and the JDBC Channel. The first scenario is that a local flume agent writes > to a downstream flume agent, but that downstream agent is unavailable. > > For the RecoverableMemoryChannel, I was forcing the in-memory queue with a > capacity of 5k events to fill up. After writing a batch of 10k events and > waiting several seconds, I re-enabled the downstream node. I was hoping that > all 10k events would make it through, but I experienced data loss. This is > consistent with the exception I was seeing in the logs: > "org.apache.flume.ChannelException: Space for commit to queue couldn't be > acquired Sinks are likely not keeping up with sources, or the buffer size is > too tight" > > With the JDBC channel, I ran into FLUME-1224, which causes the local agent to > crash. You can see more details about my configuration as part of that jira. > > My question is two-fold: 1) is it a design goal of the > RecoverableMemoryChannel to avoid data loss in this scenario? The > documentation about it is very scant, and it wasn't clear to me from a quick > look at the code. 2) is there some other configuration of channels that can > avoid data loss in this scenario? > > It also seems that there's a need for a MemoryChannel that starts spilling to > disk when it overflows, rather than writing every event to disk. Is that in > the works or should I file a jira? > > Thanks, > Joe > >