Re: How to minimize the number of message loss when broker goes down.

Joe Stein Fri, 24 Aug 2012 09:26:34 -0700

we reconcile our data using hadoop throughout the day to "heal" what we are
live streaming (especially for revenue generating data).


so our setup is basically (seriously high level)

server > kafka  > aggregate > persist insert (happens in seconds) && server
> log file > hadoop map/reduce > re-persist update/overwrite (happens every
20-180 minutes)

so we get best of both worlds and audit our trade offs => real time
analytics of our data processing and (within the time we SLA on for
reporting) have 100% accurate data.

one thing I have been meaning to-do is to keep track of the variance of
those diffs, to-be-dos


On Fri, Aug 24, 2012 at 12:05 PM, Taylor Gautier <tgaut...@tagged.com>wrote:

> Re-sending could lead to duplicated messages however - consider the case
> that the broker has committed the message and sent the ack but the ack is
> either sitting in the send buffer of the broker or recv buffer of the
> producer.
>
> Just saying that you will trade off a relatively small amount of loss for a
> probably smaller amount of duplication using only a simple ack.
>
>
> On Fri, Aug 24, 2012 at 6:49 AM, Jun Rao <jun...@gmail.com> wrote:
>
> > Xiaoyu,
> >
> > In 0.7, we have this problem that the producer doesn't receive any ack.
> So,
> > syncProducer.send is considered successful as soon as the messages are in
> > the socket buffer. If the broker goes down before the socket buffer is
> > flushed, those supposedly successful messages are lost. What's worse is
> > that the producer doesn't know this since it doesn't wait for a response.
> > Such lost messages should be small. However, I am not sure how to
> > reduce/prevent it. This issue will be addressed in 0.8, in which the
> > producer will receive an ack. If a broker goes down in the middle of a
> > send, the producer will get an exception and can resend.
> >
> > Thanks,
> >
> > Jun
> >
> > On Thu, Aug 23, 2012 at 5:19 PM, xiaoyu wang <xiaoyu.w...@gmail.com>
> > wrote:
> >
> > > Hello,
> > >
> > > We are using sync produce to push messages to kafka brokers. It will
> stop
> > > once it receives an IOException: connection reset by peer. It seems
> when
> > a
> > > broker goes down, we lose some messages. I have reduced
> > > "log.flush.interval" to 1, still see > 200 message loss.
> > >
> > > I also reduced the batch.size on producer side to 10, but the message
> > loss
> > > is about the same.
> > >
> > > So, what's the best way to minimize message loss on broker down?
> > >
> > > Thanks,
> > >
> > > -Xiaoyu
> > >
> >
>



-- 

/*
Joe Stein
http://www.linkedin.com/in/charmalloc
Twitter: @allthingshadoop <http://www.twitter.com/allthingshadoop>
*/

Re: How to minimize the number of message loss when broker goes down.

Reply via email to