Re: Getting timeouts with elastic load balancer in AWS

Joel Koshy Thu, 28 Jun 2012 14:37:36 -0700

Just to clarify: num.retries > 0 does not guarantee that all messages will
be received at the broker. It guarantees retry on exceptions - so it cannot
handle the corner case when the broker goes down after the message is
written to the socket buffer but before the buffer is flushed (in which
case no exceptions are thrown). This is addressed in 0.8 with producer acks.


That said, you have a fairly large interval between messages so it's rather
surprising. It might help to correlate this with broker-side logs to see if
the "Message sent" for message 5 was actually received on the broker.

Thanks,

Joel

On Thu, Jun 28, 2012 at 1:36 PM, Vaibhav Puranik <vpura...@gmail.com> wrote:

> Jun,
>
> Here is the log with SynProducer and DefaultEventHandler trace enabled.
>
> http://pastebin.com/dTm5RSJ9
>
> Here are my producer settings:
>
> properties.put("serializer.class", "kafka.serializer.StringEncoder")
> properties.put("broker.list", "0:localhost:9092")
> properties.put("producer.type", "async");
> properties.put("num.retries", "3");
> properties.put("batch.size", "5");
>
> (This batch size does't work because I think the some flush time  is small
> - 5 seconds - It sends every message as it comes). I am sleeping for 15
> seconds between each messages.
>
> Here is my broker output:
> _____0_____  {�D�_____1_____  �&6c_____2_____  6z��_____3_____  +
> �~_____4_____  f�tu_____6_____  ����_____7_____  \� _____8_____
>  ��Ơ_____9_____
>
>
> Notice number 5 is missing. I restarted broker between 4 and 5. You can see
> that the message  5 is missing. On producer for some reason the error
> appears between 6 and 7. Don't know why.
>
> Regards,
> Vaibhav
>
>
> On Thu, Jun 28, 2012 at 11:15 AM, Jun Rao <jun...@gmail.com> wrote:
>
> > Could you enable trace logging in DefaultEventHandler to see if the
> > following message shows up after the warning?
> >          trace("kafka producer sent messages for topics %s to broker
> %s:%d
> > (on attempt %d)"
> >
> > Thanks,
> >
> > Jun
> >
> > On Thu, Jun 28, 2012 at 10:44 AM, Vaibhav Puranik <vpura...@gmail.com
> > >wrote:
> >
> > > Hi all,
> > >
> > > I don't think the num.retries (0.7.1) is working. Here is how I tested
> > it.
> > >
> > > I wrote a simple producer that sends messages with the following
> strings
> > -
> > > "____1_____", "_____2_____"..... . As you can see all the messages are
> > > sequential.
> > > I tailed the topic log on broker. After sending every message, I have
> > added
> > > Thread.sleep for 15 seconds.
> > >
> > > Everytime I send the message, it immediately appears in the broker log.
> > But
> > > if I restart the broker to simulate producer connection drop (in the 15
> > > seconds producer sleep period), it prints the following message in the
> > > logs:
> > >
> > > [2012-06-28 10:31:17,127] INFO Disconnecting from localhost:9092
> > > (kafka.producer.SyncProducer)
> > > [2012-06-28 10:31:17,132] WARN Error sending messages, 2 attempts
> > remaining
> > > (kafka.producer.async.DefaultEventHandler)
> > > [2012-06-28 10:31:17,132] INFO Connected to localhost:9092 for
> producing
> > > (kafka.producer.SyncProducer)
> > >
> > > But the message that was sent right after the broker restart never
> > reaches
> > > the broker. The message after that (2nd message after restart) gets to
> > > broker fine and the sequence continues. Thus if I restart the broker in
> > the
> > > sleep period between message 4 and 5. I don't get the message 5. I get
> > > message 1,2,3,4,6,7,.....
> > >
> > > I tried setting num.retries to 1 and 2 thinking that in the first retry
> > it
> > > might reconnect and the second retry is where it's resending the
> message.
> > > But that doesn't work. Number of retries doesn't improve the situation.
> > >
> > > Can you see any flaw in my testing? What can I do to better test this
> > > scenario? How can I ensure that no messages are dropped? I don't think
> I
> > am
> > > loosing the message because it's in broker memory. Please correct me
> if I
> > > am wrong.
> > >
> > > Regards,
> > > Vaibhav
> > > GumGum <http://gumgum.com>
> > >
> > >
> > >
> > > On Wed, Jun 27, 2012 at 3:42 PM, Joel Koshy <jjkosh...@gmail.com>
> wrote:
> > >
> > > > 0.7.1 has this: reconnect.time.interval.ms
> > > >
> > > > Thanks,
> > > >
> > > > Joel
> > > >
> > > > On Wed, Jun 27, 2012 at 3:31 PM, Vaibhav Puranik <vpura...@gmail.com
> >
> > > > wrote:
> > > >
> > > > > That will be awesome. It will definitely address AWS ELB problem.
> > > > >
> > > > > +1 for "reconnect.interval".
> > > > >
> > > > > Regards,
> > > > > Vaibhav
> > > > > GumGum
> > > > >
> > > > >
> > > > > On Wed, Jun 27, 2012 at 3:24 PM, Niek Sanders <
> > niek.sand...@gmail.com
> > > > > >wrote:
> > > > >
> > > > > > Do producers currently leave the sockets to the brokers open
> > > > > indefinitely?
> > > > > >
> > > > > > It might make sense to add a second producer config param similar
> > to
> > > > > > "reconnect.interval" which limits on time instead of message
> count.
> > > > > > (And then reconnect based on whichever criteria is hit first).
>  For
> > > > > > folks going through ELBs on AWS, they'd set the
> > > reconnect.interval.sec
> > > > > > to something like 50 sec as a workaround for low-volume
> producers.
> > > > > >
> > > > > > - Niek
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Tue, Jun 26, 2012 at 4:52 PM, Jun Rao <jun...@gmail.com>
> wrote:
> > > > > > > Set num.retries in producer config property file. It defaults
> to
> > 0.
> > > > > > >
> > > > > > > Thanks,
> > > > > > >
> > > > > > > Jun
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Getting timeouts with elastic load balancer in AWS

Reply via email to