I was investigating some failure scenarios with the
RecoverableMemoryChannel and the JDBC Channel.  The first scenario is that
a local flume agent writes to a downstream flume agent, but that downstream
agent is unavailable.

For the RecoverableMemoryChannel, I was forcing the in-memory queue with a
capacity of 5k events to fill up. After writing a batch of 10k events and
waiting several seconds, I re-enabled the downstream node. I was hoping
that all 10k events would make it through, but I experienced data loss.
This is consistent with the exception I was seeing in the logs:
"org.apache.flume.ChannelException: Space for commit to queue couldn't be
acquired Sinks are likely not keeping up with sources, or the buffer size
is too tight"

With the JDBC channel, I ran into FLUME-1224, which causes the local agent
to crash. You can see more details about my configuration as part of that
jira.

My question is two-fold: 1) is it a design goal of the
RecoverableMemoryChannel to avoid data loss in this scenario? The
documentation about it is very scant, and it wasn't clear to me from a
quick look at the code. 2) is there some other configuration of channels
that can avoid data loss in this scenario?

It also seems that there's a need for a MemoryChannel that starts spilling
to disk when it overflows, rather than writing every event to disk. Is that
in the works or should I file a jira?

Thanks,
Joe

Reply via email to