[
https://issues.apache.org/jira/browse/FLUME-1045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13431232#comment-13431232
]
Patrick Wendell commented on FLUME-1045:
----------------------------------------
A major design goal in flume-ng is to keep the channel implementations and
interfaces as simple as possible. Trying to enact this as a combination of
channels would require two things that I don't think will fly:
- Adding logic about composing channels (if they are to be combined within an
agent)
- Creating tune-able reliability levels in the FileChannel
The goal is for each channel to have a clear, immutable durability semantics.
Of course, there would be some benefits in trying to merge memory + file
channel, but it would be outweighed by the complexity costs and general
deviation from Flume's design.
As a result, I would be in favor of having a new channel which basically enacts
memory + disk spilling, offers "best effort" semantics but loses data in a
subset of cases where the current MemoryChannel loses data (i.e. "better
effort"). This is the reason behind FLUME-1227 and I feel somewhat strongly
this is the right way to go.
Inter,
Your proposal seems to be running a two-level agent topology where a second
tier acts as a "spillover" disk-based channel. Frankly, that's not a bad ad-hoc
solution to this problem. It would certainly be more useful if the FileChannel
had tune-able durability guarantees, as you point out.
> Proposal to support disk based spooling
> ---------------------------------------
>
> Key: FLUME-1045
> URL: https://issues.apache.org/jira/browse/FLUME-1045
> Project: Flume
> Issue Type: New Feature
> Affects Versions: v1.0.0
> Reporter: Inder SIngh
> Priority: Minor
> Labels: patch
> Attachments: FLUME-1045-1.patch, FLUME-1045-2.patch
>
>
> 1. Problem Description
> A sink being unavailable at any stage in the pipeline causes it to back-off
> and retry after a while. Channel's associated with such sinks start buffering
> data with the caveat that if you are using a memory channel it can result in
> a domino effect on the entire pipeline. There could be legitimate down times
> eg: HDFS sink being down for name node maintenance, hadoop upgrades.
> 2. Why not use a durable channel (JDBC, FileChannel)?
> Want high throughput and support sink down times as a first class use-case.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira