[ 
https://issues.apache.org/jira/browse/FLUME-2173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13749012#comment-13749012
 ] 

Hari Shreedharan commented on FLUME-2173:
-----------------------------------------

Yep, that is an important aspect of it. We need to come up with a way of 
handling that. One approach I can think of is to have a configurable period for 
which the event will not be duplicated. After this period, we go in and clean 
up older event uuids. Since duplication is primarily due to timeouts etc 
somewhere in the pipeline, once all agents in the pipeline are up, the event 
should reach HDFS sinks pretty quickly. 
                
> Exactly once semantics for Flume
> --------------------------------
>
>                 Key: FLUME-2173
>                 URL: https://issues.apache.org/jira/browse/FLUME-2173
>             Project: Flume
>          Issue Type: Bug
>            Reporter: Hari Shreedharan
>
> Currently Flume guarantees only at least once semantics. This jira is meant 
> to track exactly once semantics for Flume. My initial idea is to include uuid 
> event ids on events at the original source (use a config to mark a source an 
> original source) and identify destination sinks. At the destination sinks, 
> use a unique ZK Znode to track the events. If once seen (and configured), 
> pull the duplicate out.
> This might need some refactoring, but my belief is we can do this in a 
> backward compatible way.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to