Re: [jira] [Issue Comment Deleted] (FLUME-2173) Exactly once semantics for Flume

Hari Shreedharan Tue, 27 Aug 2013 17:57:30 -0700

Hi Gabriel, Arvind,  

I copied over this discussion to the jira 
(https://issues.apache.org/jira/browse/FLUME-2173?focusedCommentId=13751975&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13751975).
 Let's continue this discussion there so we can track this better.



Thanks,
Hari


On Tuesday, August 27, 2013 at 5:35 PM, Hari Shreedharan wrote:

> Hi Arvind,
>  
> Thanks for your reply. You are right in the fact the global state check and 
> update to the sink will require each sink to explicitly support it. We can, 
> of course have this implementation be in an abstract class which is 
> inherited, but yes, this would also mean that there needs to be code changes. 
>  
>  
> It makes sense to check state in the channels, pretty much the same way as in 
> the sinks. What is a bit concerning is that we will need to do this check at 
> every agent that the event passes through, and probably make some changes in 
> the channel interface to get rid of race conditions (not sure if that is the 
> case, but I think we will need to). Given that an event is likely to pass 
> through 2-3 tiers, each event gets delayed by the time taken by that many ZK 
> round-trips. I am open to this as well, especially considering that it is 
> likely to be a better OOB experience for many users (the ones who have their 
> own custom sinks). Would it suffice to check at the sinks at the terminal 
> agent to make sure that an event gets written out only once?  
>  
> Thinking about this, having a once-only delivery at the channel level also 
> opens up some possibilities with regards to being able to do some sort of 
> processing on events. Having a guarantee of seeing an event exactly once 
> allows us to do some event processing like counters etc. That seems like a 
> good side effect to have.
>  
> Either way, I am glad we agree on the aspect of checking a global state 
> manager to verify that events are deduped.   
>  
>  
> Thanks,
> Hari
>  
>  
> On Tuesday, August 27, 2013 at 2:12 PM, Arvind Prabhakar wrote:
>  
>  
>  
>  
> > Hi Hari,
> >  
> > Thanks for bringing this up for discussion. I think it will be tremendously
> > beneficial to Flume users if we can extend once-only guarantee. Your
> > initial suggestion seems reasonable of having a Sink trap the events and
> > reference a global state to drop duplicates. Rather than pushing this
> > functionality to Sinks is there any other way by which we can make it more
> > generally available? The reason I raise this concern is because otherwise
> > this becomes a feature of a particular sink and not every sink will have
> > the necessary implementation opportunity to get this.
> >  
> > Alternatively what do you think about this being done at the channel level?
> > Since we normally do not see custom implementations of channels, an
> > implementation that works with the channel will likely be more useful for
> > the broader community of Flume users.
> >  
> > Regards,
> > Arvidn
> >  
> >  
> > On Sun, Aug 25, 2013 at 9:07 AM, Hari Shreedharan 
> > <[email protected] (mailto:[email protected])
> > > wrote:
> >  
> >  
> > > Hi Gabriel,
> > >  
> > > Thanks for your input. The part where we use replicating channel selector
> > > to purposefully replicate - we can easily make it configurable whether to
> > > delete deplicate events or not. That should not be difficult to do.
> > >  
> > > The 2nd point where multiple agents/sinks could write the same event can
> > > be solved by namespacing the events into different namespaces. So each 
> > > sink
> > > checks one namespace for the event, and multiple sinks can belong to the
> > > same namespace - this way, if multiple events are going to write to the
> > > same HDFS cluster, then if a duplicate occurs we can easily drop it.
> > > Unfortunately, this also does not work around the who
> > > HDFS-writing-but-throwing issue.
> > >  
> > > I agree updating ZK will hit latency, but that is the cost to build once
> > > only semantics on a highly flexible system. If you look at the algorithm,
> > > we actually go to ZK only once per event (to create, there are no updates)
> > > - this can even happen per batch if needed to reduce ZK round trips 
> > > (though
> > > I am not sure if ZK provides a batch API).
> > >  
> > > The two phase commit approach sounds good, but it might require interface
> > > changes which can now only be made in Flume 2.x. Alse, if we use a single
> > > UUID combined with several flags we might be able to work duplicates 
> > > caused
> > > by this replication.
> > >  
> > >  
> > > Thanks,
> > > Hari
> > >  
> > >  
> > > On Sunday, August 25, 2013 at 7:24 AM, Gabriel Commeau wrote:
> > >  
> > > > Hi Hari,
> > > >  
> > > >  
> > > > I deleted my comment (again). The mailing list is probably a better
> > > avenue
> > > > to discuss this  sorry about that! :)
> > > >  
> > > > I can find at least one other way duplicate events can occur, and so 
> > > > what
> > > > I provided helps to reduce duplicate events but is not sufficient to
> > > > guaranty exactly once semantics. However, I still think that using a
> > > > 2-phase commit when writing to multiple channels would benefit Flume.
> > > >  
> > >  
> > > This
> > > > should probably be a different ticket though.
> > > >  
> > > > Concerning the algorithm you offered, the case of replicating channel
> > > > selector should probably be handled, by creating a new UUID for each
> > > > duplicate message.
> > > > I hope this helps.
> > > >  
> > > >  
> > > > Regards,
> > > >  
> > > > Gabriel

Re: [jira] [Issue Comment Deleted] (FLUME-2173) Exactly once semantics for Flume

Reply via email to