[ 
https://issues.apache.org/jira/browse/FLUME-1045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13238295#comment-13238295
 ] 

Sharad Agarwal commented on FLUME-1045:
---------------------------------------

bq. as it violates the transactional exchange invariant of the design

Some systems have very high thruput requirement and have relaxed transaction 
needs. Typically these applications want the system to run at very high thruput 
and incase of failures, are ok to lose or replay small number of events.
FileChannel intends to be fully transactional and also high thruput. However it 
will be IO/disk bound.

1. Curious to know what is the right way in current Flume architecture to trade 
off transactional guarantees with very high thruput system; providing certain 
degree of reliability incase the next link is down ?

2. One of the solution which I think of where the IO cost is incurred on only 
failures and still things are transactional:
Wrap the MemoryChannel and FileChannel into a new channel say 
SpoolingMemoryChannel. Events flow via memory channel; on reaching the buffer 
capacity of memory channel, events are spooled into FileChannel. Since the 
underlying channels are transactional, SpoolingMemoryChannel can also be easily 
made transactional.





                
> Proposal to support disk based spooling
> ---------------------------------------
>
>                 Key: FLUME-1045
>                 URL: https://issues.apache.org/jira/browse/FLUME-1045
>             Project: Flume
>          Issue Type: New Feature
>    Affects Versions: v1.0.0
>            Reporter: Inder SIngh
>            Priority: Minor
>              Labels: patch
>         Attachments: FLUME-1045-1.patch, FLUME-1045-2.patch
>
>
> 1. Problem Description 
> A sink being unavailable at any stage in the pipeline causes it to back-off 
> and retry after a while. Channel's associated with such sinks start buffering 
> data with the caveat that if you are using a memory channel it can result in 
> a domino effect on the entire pipeline. There could be legitimate down times 
> eg: HDFS sink being down for name node maintenance, hadoop upgrades. 
> 2. Why not use a durable channel (JDBC, FileChannel)?
> Want high throughput and support sink down times as a first class use-case.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to