[ https://issues.apache.org/jira/browse/FLUME-1045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13238295#comment-13238295 ]
Sharad Agarwal commented on FLUME-1045: --------------------------------------- bq. as it violates the transactional exchange invariant of the design Some systems have very high thruput requirement and have relaxed transaction needs. Typically these applications want the system to run at very high thruput and incase of failures, are ok to lose or replay small number of events. FileChannel intends to be fully transactional and also high thruput. However it will be IO/disk bound. 1. Curious to know what is the right way in current Flume architecture to trade off transactional guarantees with very high thruput system; providing certain degree of reliability incase the next link is down ? 2. One of the solution which I think of where the IO cost is incurred on only failures and still things are transactional: Wrap the MemoryChannel and FileChannel into a new channel say SpoolingMemoryChannel. Events flow via memory channel; on reaching the buffer capacity of memory channel, events are spooled into FileChannel. Since the underlying channels are transactional, SpoolingMemoryChannel can also be easily made transactional. > Proposal to support disk based spooling > --------------------------------------- > > Key: FLUME-1045 > URL: https://issues.apache.org/jira/browse/FLUME-1045 > Project: Flume > Issue Type: New Feature > Affects Versions: v1.0.0 > Reporter: Inder SIngh > Priority: Minor > Labels: patch > Attachments: FLUME-1045-1.patch, FLUME-1045-2.patch > > > 1. Problem Description > A sink being unavailable at any stage in the pipeline causes it to back-off > and retry after a while. Channel's associated with such sinks start buffering > data with the caveat that if you are using a memory channel it can result in > a domino effect on the entire pipeline. There could be legitimate down times > eg: HDFS sink being down for name node maintenance, hadoop upgrades. > 2. Why not use a durable channel (JDBC, FileChannel)? > Want high throughput and support sink down times as a first class use-case. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira