My bad for not chiming on this thread soon enough. When we laid out the initial architecture, the following assumptions were made and I still think that most of them are valid:
1. Sources doing put() on channel should relay back any exceptions they receive from the channel. They should not die or become invalid due to this. If they do, it is more of a bug in source implementation. 2. Channels must respect capacity. This is vital for operators to ensure that they can size a system without overwhelming it. Both mem and jdbc channels support size specification at this time. 3. Channels should never block. This is to ensure that there is no scope of threads deadlocking within the agent due to bugs or invalid state of the system. The chosen alternative to blocking was the notion of the sink runner which will honor backoff strategy when necessary. Consequently the implementation of sink should send the correct signal to the runner in case it is not able to take events from the channel or deliver events to the downstream destination. At some point in time, when we have the basic implementation of Flume working in production to validate all of these semantics, we can start discussion on how best these semantics can change to accommodate any new findings that we discover in the field. Thanks, Arvind Prabhakar On Mon, Feb 27, 2012 at 11:30 AM, Prasad Mujumdar <[email protected]> wrote: > IMO the blocking vs wait time should be an attribute of the flow and not > individual component. Perhaps each source/sink/channel should make it > configurable (with consistent default) so that it it can be tweaked per the > use case. The common attributes like timeout, capacity can be standard > configurations that each component should support wherever possible. > > @Brock, I will try to include the relevant conclusions of this discussion > in the dev guide. > > thanks > Prasad > > > On Mon, Feb 27, 2012 at 7:35 AM, Peter Newcomb <[email protected]>wrote: > >> Juhani, FWIW I agree with most of what you described, based on my reading >> and use of the codebase. Brock, I agree that these things are not yet >> adequately documented--especially in terms of Javadocs for the main >> interfaces: Source, Channel, and Sink. Also, there is enough variation >> among the various implementations of these interfaces to lead to ambiguous >> interpretation. >> >> One thing I wanted to comment on specifically is Juhani's statement about >> channel capacity: >> >> > Channels: >> > - Only memory channels have a capacity, but when that is exceeded >> > ChannelException seems a clearcut reaction >> >> Before your recent refactoring of MemoryChannel, put() would block >> indefinitely if the queue was at capacity--are you suggesting that this was >> incorrect behavior that should not be allowed? Or just that any such >> blocking should have a finite duration (similar to take() keep-alive), and >> throw ChannelException upon timeout? >> >> Also, other channels may well have implicit capacities, for instance >> available space in a database or filesystem partition, though I agree that >> ChannelException would be appropriate in those cases. >> >> -peter >>
