Just to remove all the variables regarding me restarting the broker, I did a test with Amazon ELB. (0.7.1 producer and 0.7.0 broker) Thus, no broker restarts. The connection was getting broken because Amazon ELB was closing all the connections.
I found the exact same result. In spite of specifying num.retries and reconnect.time.interval.ms = 50000, we loose one batch. I understand that num.retries does not gurantee that all the messages will be sent. But I feel like it should do it in this case though. Please let me know if my expectation is unjust. Regards, Vaibhav On Thu, Jun 28, 2012 at 2:37 PM, Joel Koshy <jjkosh...@gmail.com> wrote: > Just to clarify: num.retries > 0 does not guarantee that all messages will > be received at the broker. It guarantees retry on exceptions - so it cannot > handle the corner case when the broker goes down after the message is > written to the socket buffer but before the buffer is flushed (in which > case no exceptions are thrown). This is addressed in 0.8 with producer > acks. > > That said, you have a fairly large interval between messages so it's rather > surprising. It might help to correlate this with broker-side logs to see if > the "Message sent" for message 5 was actually received on the broker. > > Thanks, > > Joel > > On Thu, Jun 28, 2012 at 1:36 PM, Vaibhav Puranik <vpura...@gmail.com> > wrote: > > > Jun, > > > > Here is the log with SynProducer and DefaultEventHandler trace enabled. > > > > http://pastebin.com/dTm5RSJ9 > > > > Here are my producer settings: > > > > properties.put("serializer.class", "kafka.serializer.StringEncoder") > > properties.put("broker.list", "0:localhost:9092") > > properties.put("producer.type", "async"); > > properties.put("num.retries", "3"); > > properties.put("batch.size", "5"); > > > > (This batch size does't work because I think the some flush time is > small > > - 5 seconds - It sends every message as it comes). I am sleeping for 15 > > seconds between each messages. > > > > Here is my broker output: > > _____0_____ {�D�_____1_____ �&6c_____2_____ 6z��_____3_____ + > > �~_____4_____ f�tu_____6_____ ����_____7_____ \� _____8_____ > > ��Ơ_____9_____ > > > > > > Notice number 5 is missing. I restarted broker between 4 and 5. You can > see > > that the message 5 is missing. On producer for some reason the error > > appears between 6 and 7. Don't know why. > > > > Regards, > > Vaibhav > > > > > > On Thu, Jun 28, 2012 at 11:15 AM, Jun Rao <jun...@gmail.com> wrote: > > > > > Could you enable trace logging in DefaultEventHandler to see if the > > > following message shows up after the warning? > > > trace("kafka producer sent messages for topics %s to broker > > %s:%d > > > (on attempt %d)" > > > > > > Thanks, > > > > > > Jun > > > > > > On Thu, Jun 28, 2012 at 10:44 AM, Vaibhav Puranik <vpura...@gmail.com > > > >wrote: > > > > > > > Hi all, > > > > > > > > I don't think the num.retries (0.7.1) is working. Here is how I > tested > > > it. > > > > > > > > I wrote a simple producer that sends messages with the following > > strings > > > - > > > > "____1_____", "_____2_____"..... . As you can see all the messages > are > > > > sequential. > > > > I tailed the topic log on broker. After sending every message, I have > > > added > > > > Thread.sleep for 15 seconds. > > > > > > > > Everytime I send the message, it immediately appears in the broker > log. > > > But > > > > if I restart the broker to simulate producer connection drop (in the > 15 > > > > seconds producer sleep period), it prints the following message in > the > > > > logs: > > > > > > > > [2012-06-28 10:31:17,127] INFO Disconnecting from localhost:9092 > > > > (kafka.producer.SyncProducer) > > > > [2012-06-28 10:31:17,132] WARN Error sending messages, 2 attempts > > > remaining > > > > (kafka.producer.async.DefaultEventHandler) > > > > [2012-06-28 10:31:17,132] INFO Connected to localhost:9092 for > > producing > > > > (kafka.producer.SyncProducer) > > > > > > > > But the message that was sent right after the broker restart never > > > reaches > > > > the broker. The message after that (2nd message after restart) gets > to > > > > broker fine and the sequence continues. Thus if I restart the broker > in > > > the > > > > sleep period between message 4 and 5. I don't get the message 5. I > get > > > > message 1,2,3,4,6,7,..... > > > > > > > > I tried setting num.retries to 1 and 2 thinking that in the first > retry > > > it > > > > might reconnect and the second retry is where it's resending the > > message. > > > > But that doesn't work. Number of retries doesn't improve the > situation. > > > > > > > > Can you see any flaw in my testing? What can I do to better test this > > > > scenario? How can I ensure that no messages are dropped? I don't > think > > I > > > am > > > > loosing the message because it's in broker memory. Please correct me > > if I > > > > am wrong. > > > > > > > > Regards, > > > > Vaibhav > > > > GumGum <http://gumgum.com> > > > > > > > > > > > > > > > > On Wed, Jun 27, 2012 at 3:42 PM, Joel Koshy <jjkosh...@gmail.com> > > wrote: > > > > > > > > > 0.7.1 has this: reconnect.time.interval.ms > > > > > > > > > > Thanks, > > > > > > > > > > Joel > > > > > > > > > > On Wed, Jun 27, 2012 at 3:31 PM, Vaibhav Puranik < > vpura...@gmail.com > > > > > > > > wrote: > > > > > > > > > > > That will be awesome. It will definitely address AWS ELB problem. > > > > > > > > > > > > +1 for "reconnect.interval". > > > > > > > > > > > > Regards, > > > > > > Vaibhav > > > > > > GumGum > > > > > > > > > > > > > > > > > > On Wed, Jun 27, 2012 at 3:24 PM, Niek Sanders < > > > niek.sand...@gmail.com > > > > > > >wrote: > > > > > > > > > > > > > Do producers currently leave the sockets to the brokers open > > > > > > indefinitely? > > > > > > > > > > > > > > It might make sense to add a second producer config param > similar > > > to > > > > > > > "reconnect.interval" which limits on time instead of message > > count. > > > > > > > (And then reconnect based on whichever criteria is hit first). > > For > > > > > > > folks going through ELBs on AWS, they'd set the > > > > reconnect.interval.sec > > > > > > > to something like 50 sec as a workaround for low-volume > > producers. > > > > > > > > > > > > > > - Niek > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Tue, Jun 26, 2012 at 4:52 PM, Jun Rao <jun...@gmail.com> > > wrote: > > > > > > > > Set num.retries in producer config property file. It defaults > > to > > > 0. > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > > > > Jun > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >