Re: Getting timeouts with elastic load balancer in AWS

Vaibhav Puranik Thu, 28 Jun 2012 17:41:04 -0700

Just to remove all the variables regarding me restarting the broker, I did
a test with Amazon ELB. (0.7.1 producer and 0.7.0 broker)
Thus, no broker restarts. The connection was getting broken because Amazon
ELB was closing all the connections.


I found the exact same result. In spite of specifying num.retries and
reconnect.time.interval.ms = 50000, we loose one batch. I understand that
num.retries does not gurantee that all the messages will be sent.
But I feel like it should do it in this case though. Please let me know if
my expectation is unjust.

Regards,
Vaibhav


On Thu, Jun 28, 2012 at 2:37 PM, Joel Koshy <jjkosh...@gmail.com> wrote:

> Just to clarify: num.retries > 0 does not guarantee that all messages will
> be received at the broker. It guarantees retry on exceptions - so it cannot
> handle the corner case when the broker goes down after the message is
> written to the socket buffer but before the buffer is flushed (in which
> case no exceptions are thrown). This is addressed in 0.8 with producer
> acks.
>
> That said, you have a fairly large interval between messages so it's rather
> surprising. It might help to correlate this with broker-side logs to see if
> the "Message sent" for message 5 was actually received on the broker.
>
> Thanks,
>
> Joel
>
> On Thu, Jun 28, 2012 at 1:36 PM, Vaibhav Puranik <vpura...@gmail.com>
> wrote:
>
> > Jun,
> >
> > Here is the log with SynProducer and DefaultEventHandler trace enabled.
> >
> > http://pastebin.com/dTm5RSJ9
> >
> > Here are my producer settings:
> >
> > properties.put("serializer.class", "kafka.serializer.StringEncoder")
> > properties.put("broker.list", "0:localhost:9092")
> > properties.put("producer.type", "async");
> > properties.put("num.retries", "3");
> > properties.put("batch.size", "5");
> >
> > (This batch size does't work because I think the some flush time  is
> small
> > - 5 seconds - It sends every message as it comes). I am sleeping for 15
> > seconds between each messages.
> >
> > Here is my broker output:
> > _____0_____  {�D�_____1_____  �&6c_____2_____  6z��_____3_____  +
> > �~_____4_____  f�tu_____6_____  ����_____7_____  \� _____8_____
> >  ��Ơ_____9_____
> >
> >
> > Notice number 5 is missing. I restarted broker between 4 and 5. You can
> see
> > that the message  5 is missing. On producer for some reason the error
> > appears between 6 and 7. Don't know why.
> >
> > Regards,
> > Vaibhav
> >
> >
> > On Thu, Jun 28, 2012 at 11:15 AM, Jun Rao <jun...@gmail.com> wrote:
> >
> > > Could you enable trace logging in DefaultEventHandler to see if the
> > > following message shows up after the warning?
> > >          trace("kafka producer sent messages for topics %s to broker
> > %s:%d
> > > (on attempt %d)"
> > >
> > > Thanks,
> > >
> > > Jun
> > >
> > > On Thu, Jun 28, 2012 at 10:44 AM, Vaibhav Puranik <vpura...@gmail.com
> > > >wrote:
> > >
> > > > Hi all,
> > > >
> > > > I don't think the num.retries (0.7.1) is working. Here is how I
> tested
> > > it.
> > > >
> > > > I wrote a simple producer that sends messages with the following
> > strings
> > > -
> > > > "____1_____", "_____2_____"..... . As you can see all the messages
> are
> > > > sequential.
> > > > I tailed the topic log on broker. After sending every message, I have
> > > added
> > > > Thread.sleep for 15 seconds.
> > > >
> > > > Everytime I send the message, it immediately appears in the broker
> log.
> > > But
> > > > if I restart the broker to simulate producer connection drop (in the
> 15
> > > > seconds producer sleep period), it prints the following message in
> the
> > > > logs:
> > > >
> > > > [2012-06-28 10:31:17,127] INFO Disconnecting from localhost:9092
> > > > (kafka.producer.SyncProducer)
> > > > [2012-06-28 10:31:17,132] WARN Error sending messages, 2 attempts
> > > remaining
> > > > (kafka.producer.async.DefaultEventHandler)
> > > > [2012-06-28 10:31:17,132] INFO Connected to localhost:9092 for
> > producing
> > > > (kafka.producer.SyncProducer)
> > > >
> > > > But the message that was sent right after the broker restart never
> > > reaches
> > > > the broker. The message after that (2nd message after restart) gets
> to
> > > > broker fine and the sequence continues. Thus if I restart the broker
> in
> > > the
> > > > sleep period between message 4 and 5. I don't get the message 5. I
> get
> > > > message 1,2,3,4,6,7,.....
> > > >
> > > > I tried setting num.retries to 1 and 2 thinking that in the first
> retry
> > > it
> > > > might reconnect and the second retry is where it's resending the
> > message.
> > > > But that doesn't work. Number of retries doesn't improve the
> situation.
> > > >
> > > > Can you see any flaw in my testing? What can I do to better test this
> > > > scenario? How can I ensure that no messages are dropped? I don't
> think
> > I
> > > am
> > > > loosing the message because it's in broker memory. Please correct me
> > if I
> > > > am wrong.
> > > >
> > > > Regards,
> > > > Vaibhav
> > > > GumGum <http://gumgum.com>
> > > >
> > > >
> > > >
> > > > On Wed, Jun 27, 2012 at 3:42 PM, Joel Koshy <jjkosh...@gmail.com>
> > wrote:
> > > >
> > > > > 0.7.1 has this: reconnect.time.interval.ms
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Joel
> > > > >
> > > > > On Wed, Jun 27, 2012 at 3:31 PM, Vaibhav Puranik <
> vpura...@gmail.com
> > >
> > > > > wrote:
> > > > >
> > > > > > That will be awesome. It will definitely address AWS ELB problem.
> > > > > >
> > > > > > +1 for "reconnect.interval".
> > > > > >
> > > > > > Regards,
> > > > > > Vaibhav
> > > > > > GumGum
> > > > > >
> > > > > >
> > > > > > On Wed, Jun 27, 2012 at 3:24 PM, Niek Sanders <
> > > niek.sand...@gmail.com
> > > > > > >wrote:
> > > > > >
> > > > > > > Do producers currently leave the sockets to the brokers open
> > > > > > indefinitely?
> > > > > > >
> > > > > > > It might make sense to add a second producer config param
> similar
> > > to
> > > > > > > "reconnect.interval" which limits on time instead of message
> > count.
> > > > > > > (And then reconnect based on whichever criteria is hit first).
> >  For
> > > > > > > folks going through ELBs on AWS, they'd set the
> > > > reconnect.interval.sec
> > > > > > > to something like 50 sec as a workaround for low-volume
> > producers.
> > > > > > >
> > > > > > > - Niek
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > On Tue, Jun 26, 2012 at 4:52 PM, Jun Rao <jun...@gmail.com>
> > wrote:
> > > > > > > > Set num.retries in producer config property file. It defaults
> > to
> > > 0.
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > >
> > > > > > > > Jun
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Getting timeouts with elastic load balancer in AWS

Reply via email to