[
https://issues.apache.org/jira/browse/FLUME-2173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13753284#comment-13753284
]
Arvind Prabhakar commented on FLUME-2173:
-----------------------------------------
(continuing the discussion here instead of email)
Thanks Hari. In the spirit of keeping processing components pluggable, it would
make sense to have this de-dupe logic pluggable itself. One benefit of doing so
would be the choice of different implementations that could provide broader
degree of guarantees. For example, the ZK based approach over the enter
pipeline could provide complete once-only delivery guarantee but as you pointed
out could add latency to delivery. Alternatively there could be locally
optimized implementation of this approach that act on subsets of the event
stream and thus benefit partitioned deployments where events cannot cross wires.
Another use-case to consider would be to locally optimize for multiple channels
within the same Agent. That way an Agent that has a File Channel setup as the
primary channel and a Memory Channel setup as a fall-back channel in case the
primary is full - would need local deduping without having to store state in ZK.
> Exactly once semantics for Flume
> --------------------------------
>
> Key: FLUME-2173
> URL: https://issues.apache.org/jira/browse/FLUME-2173
> Project: Flume
> Issue Type: Bug
> Reporter: Hari Shreedharan
> Assignee: Hari Shreedharan
>
> Currently Flume guarantees only at least once semantics. This jira is meant
> to track exactly once semantics for Flume. My initial idea is to include uuid
> event ids on events at the original source (use a config to mark a source an
> original source) and identify destination sinks. At the destination sinks,
> use a unique ZK Znode to track the events. If once seen (and configured),
> pull the duplicate out.
> This might need some refactoring, but my belief is we can do this in a
> backward compatible way.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira