Sink Behavior Standardization

Juhani Connolly Sun, 26 Feb 2012 23:17:22 -0800

Hi,

On 02/27/2012 02:44 AM, Brock Noland wrote:

Hello,


This might be something for the developer guide or it might be
somewhere and I just missed it.  I feel like we should set down some
expectations in regards to:

1) Source behavior when:
   a) Channel put fails
   b) Source started but is unable to obtain new events for some reason
2) Channel behavior when:
   a) Channel capacity exceeded
   b) take when channel is empty
3) Sink behavior when:
   a) Channel take returns null
   b) Sink cannot write to the downstream location

Totally agree. There is little consistency in implementations rightnow, and part of the problem is some of the interfaces aren'tdocumented. We should probably have a JIRA to document sink and sourceinterfaces including failure patterns


My take on your issues

Sources:

- Channel put failure is pretty clearcut, failure should be returned,and the previous agent should rollback the transaction

- Inability to obtain events should probably be logged at a high level

Channels:

- Only memory channels have a capacity, but when that is exceededChannelException seems a clearcut reaction- I think that blocking takes are certainly preferable. That said, Ibelieve that it is more important that a sink return backoff when nodata was processed.


Sinks:
- Ready if data was processed.
- Backoff if no data was processed/the sink needs breathing space.
- Rollback and backoff if downstream write failed

- Throw EventDeliveryException if the sink has a serious problem thatputs it out of commission(this would result in failover or removal frombalancing). This would be for cases where it is suspected thatdownstream is unavailable long-term(e.g. avrosink has repeatedly failedfor X times in a row)

I tried to kick off some discussion about this in regards to sinkfailover too. When developing the failover sink processor I assumed thata failed sink will throw SinkDeliveryException, seehttps://issues.apache.org/jira/browse/FLUME-981

This comes about when I noticed some inconsistencies.  For example, a
take in MemoryChannel blocks for a few seconds by default and
JDBCChannel does not (FLUME-998). Combined with HDFSEvent sink, this
causes tremendous amounts of CPU consumption. Also, currently if HDFS
is unavailable for a period, flume needs to be restarted (FLUME-985).

My general thoughts are are based on experience working with JMS based services.

1) Source/Channel/Sink should not require a restart when up or down
stream services are restarted or become temporarily unavailable.
2) Channel capacity being exceeded should not lead to sources dying
and thus requiring a flume restart. This will happen when downstream
destinations slow down for various reasons.

What would be a preferable alternative? For sources with an upstream,they should be able to signal upstream that the transaction needs to berolled back. Other than that though, throwing away data that couldn't bedelivered is the only possibility with a plain channel? Hopefully we cando something like buffers in scribed.

Brock

Re: Source/Channel/Sink Behavior Standardization

Reply via email to