[ 
https://issues.apache.org/jira/browse/FLUME-896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13182644#comment-13182644
 ] 

Peter Newcomb commented on FLUME-896:
-------------------------------------

Hi Praveen,

Thanks for the comment.  It is true that this code does not attempt to
address all of those concerns--due to conscious choice on my part.  In
what I gather is the spirit of Flume NG, I implemented this as a
simple component designed to be combined with other components in
order to address a broad range of use cases, rather than attempting to
address those use cases directly from within the component itself.
Also, as just one of many implementations of Channel, it need not be
the only way of temporarily storing events in a filesystem.

Your comment makes me realize, however, that I have failed to describe
the rest of the sort of context in which I see this implementation
being used.  Let me sketch out how I'm actually using it.  At the
"agent" tier:

  SOURCE | PseudoTxnMemoryChannel | BatchCollector | FileChannel | AvroSink

And at the "collector" tier:

  AvroSource | FileChannel | BatchSplitter | SynchronousChannel | HDFSEventSink

I've embedded the "agent" pipeline into our existing service (a
specialized web server for collecting data through tracking pixel
hits), so SOURCE is really just that service directly calling put() on
the PseudoTxnMemoryChannel.

BatchCollector and BatchSplitter are effectively both sinks and
sources, though they officially implement the Sink interface, not the
Source interface.  The do what their names imply: the first collects a
set of events as a batch, emitting a single event whose body contains
the entire set of events, while the second takes a batch event created
by the first and re-emits each individual event.  The batches may be
compressed if desired, and transaction semantics are adhered to
throughout.

SynchronousChannel is a special in-memory channel based on
SynchronousQueue that is designed to safely mediate between sources
and sinks that may have differing transaction boundaries (i.e., number
of events/transaction).  Unlike MemoryChannel, SynchronousChannel does
not allow a source's transaction to succeed unless the events put()
during that transaction have not only been take()n by a sink, but that
the sink's transaction has fully committed.

FYI, I do intend to contribute all of these components, but have been
limited by how much time I can spend on contribution efforts.  If
there's interest I'll try to do it sooner rather than later.

As to the point about porting implementations from other projects:
while it is true that I have not copied an implementation from another
OS project, I have poured into this implementation many years of
experience implementing Flume-like systems, and its mechanisms are
ones that are tried and true, at least when combined with the batching
mechanism I descibed above.

All of that said, it may simply be that this is not the implementation
envisioned for the Channel implementation named "FileChannel", which
is perfectly fine... I'm not wed to this implementation being named
anything in particular, nor even to it be adopted by Flume at all--I
created it only because the existing JDBC channel is too heavyweight
for our particular application.

-peter
                
> Implement file write ahead log channel
> --------------------------------------
>
>                 Key: FLUME-896
>                 URL: https://issues.apache.org/jira/browse/FLUME-896
>             Project: Flume
>          Issue Type: New Feature
>          Components: Channel
>    Affects Versions: NG alpha 1
>            Reporter: E. Sammer
>            Assignee: Peter Newcomb
>             Fix For: v1.1.0
>
>
> Implement a channel that uses a regular file system and a write ahead log for 
> durable event delivery.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to