[ 
https://issues.apache.org/jira/browse/FLUME-2173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13753284#comment-13753284
 ] 

Arvind Prabhakar commented on FLUME-2173:
-----------------------------------------

(continuing the discussion here instead of email)

Thanks Hari. In the spirit of keeping processing components pluggable, it would 
make sense to have this de-dupe logic pluggable itself. One benefit of doing so 
would be the choice of different implementations that could provide broader 
degree of guarantees. For example, the ZK based approach over the enter 
pipeline could provide complete once-only delivery guarantee but as you pointed 
out could add latency to delivery. Alternatively there could be locally 
optimized implementation of this approach that act on subsets of the event 
stream and thus benefit partitioned deployments where events cannot cross wires.

Another use-case to consider would be to locally optimize for multiple channels 
within the same Agent. That way an Agent that has a File Channel setup as the 
primary channel and a Memory Channel setup as a fall-back channel in case the 
primary is full - would need local deduping without having to store state in ZK.


                
> Exactly once semantics for Flume
> --------------------------------
>
>                 Key: FLUME-2173
>                 URL: https://issues.apache.org/jira/browse/FLUME-2173
>             Project: Flume
>          Issue Type: Bug
>            Reporter: Hari Shreedharan
>            Assignee: Hari Shreedharan
>
> Currently Flume guarantees only at least once semantics. This jira is meant 
> to track exactly once semantics for Flume. My initial idea is to include uuid 
> event ids on events at the original source (use a config to mark a source an 
> original source) and identify destination sinks. At the destination sinks, 
> use a unique ZK Znode to track the events. If once seen (and configured), 
> pull the duplicate out.
> This might need some refactoring, but my belief is we can do this in a 
> backward compatible way.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to