[jira] [Commented] (FLUME-1045) Proposal to support disk based spooling

Patrick Wendell (JIRA) Wed, 08 Aug 2012 10:30:23 -0700

    [ 
https://issues.apache.org/jira/browse/FLUME-1045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13431232#comment-13431232
 ]


Patrick Wendell commented on FLUME-1045:
----------------------------------------

A major design goal in flume-ng is to keep the channel implementations and 
interfaces as simple as possible. Trying to enact this as a combination of 
channels would require two things that I don't think will fly:

- Adding logic about composing channels (if they are to be combined within an 
agent)
- Creating tune-able reliability levels in the FileChannel

The goal is for each channel to have a clear, immutable durability semantics. 
Of course, there would be some benefits in trying to merge memory + file 
channel, but it would be outweighed by the complexity costs and general 
deviation from Flume's design. 

As a result, I would be in favor of having a new channel which basically enacts 
memory + disk spilling, offers "best effort" semantics but loses data in a 
subset of cases where the current MemoryChannel loses data (i.e. "better 
effort"). This is the reason behind FLUME-1227 and I feel somewhat strongly 
this is the right way to go. 

Inter,

Your proposal seems to be running a two-level agent topology where a second 
tier acts as a "spillover" disk-based channel. Frankly, that's not a bad ad-hoc 
solution to this problem. It would certainly be more useful if the FileChannel 
had tune-able durability guarantees, as you point out.
                
> Proposal to support disk based spooling
> ---------------------------------------
>
>                 Key: FLUME-1045
>                 URL: https://issues.apache.org/jira/browse/FLUME-1045
>             Project: Flume
>          Issue Type: New Feature
>    Affects Versions: v1.0.0
>            Reporter: Inder SIngh
>            Priority: Minor
>              Labels: patch
>         Attachments: FLUME-1045-1.patch, FLUME-1045-2.patch
>
>
> 1. Problem Description 
> A sink being unavailable at any stage in the pipeline causes it to back-off 
> and retry after a while. Channel's associated with such sinks start buffering 
> data with the caveat that if you are using a memory channel it can result in 
> a domino effect on the entire pipeline. There could be legitimate down times 
> eg: HDFS sink being down for name node maintenance, hadoop upgrades. 
> 2. Why not use a durable channel (JDBC, FileChannel)?
> Want high throughput and support sink down times as a first class use-case.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (FLUME-1045) Proposal to support disk based spooling

Reply via email to