On Tue, May 8, 2012 at 10:14 AM, S Ahmed <[email protected]> wrote:
> Greetings,
>
> Just looking over the code a bit, and I really appreciate the level of
> comments in the code!
>
> I am interesting in learning how the generic design works when it comes to
> this (with my assumptions, please correct me where appropriate):
>
> 1. When data is being stored in-memory, it is stored in some sort of
> collection like a conconcurrenthashmap.  So this in memory structure gets
> appended to until a certain criteria is met (time based, # of items, size
> of data), then it gets flushed/sinked to one of the many implementations.

There is two implementations of storing events in memory.

There is MemoryChannel which uses a LinkedBlockingDeque:

https://github.com/apache/flume/blob/trunk/flume-ng-core/src/main/java/org/apache/flume/channel/MemoryChannel.java

And there is FileChannel which uses a circular array:

https://github.com/apache/flume/blob/trunk/flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/FlumeEventQueue.java


In both cases the Channel only stores data until a sink takes the
data. Sinks/Sources implement their own batching.

>
> 2. How does this collection get sinked all the while accepting new data.  I
> also am guessing that this process is abstracted, so future implementations
> can just borrow on this functionality and now have to worry about
> concurrency issues.

Sinks/Sources use a Transaction to take data off and put data on the
Channel. They don't have to worry about which channel they are using.
If you were writing a Channel you'd have to worry about how to handle
the problems described above.

Brock

-- 
Apache MRUnit - Unit testing MapReduce - http://incubator.apache.org/mrunit/

Reply via email to