Jun, Here is the log with SynProducer and DefaultEventHandler trace enabled.
http://pastebin.com/dTm5RSJ9 Here are my producer settings: properties.put("serializer.class", "kafka.serializer.StringEncoder") properties.put("broker.list", "0:localhost:9092") properties.put("producer.type", "async"); properties.put("num.retries", "3"); properties.put("batch.size", "5"); (This batch size does't work because I think the some flush time is small - 5 seconds - It sends every message as it comes). I am sleeping for 15 seconds between each messages. Here is my broker output: _____0_____{�D�_____1_____�&6c_____2_____6z��_____3_____+�~_____4_____f�tu_____6_____����_____7_____\�_____8_____��Ơ_____9_____ Notice number 5 is missing. I restarted broker between 4 and 5. You can see that the message 5 is missing. On producer for some reason the error appears between 6 and 7. Don't know why. Regards, Vaibhav On Thu, Jun 28, 2012 at 11:15 AM, Jun Rao <jun...@gmail.com> wrote: > Could you enable trace logging in DefaultEventHandler to see if the > following message shows up after the warning? > trace("kafka producer sent messages for topics %s to broker %s:%d > (on attempt %d)" > > Thanks, > > Jun > > On Thu, Jun 28, 2012 at 10:44 AM, Vaibhav Puranik <vpura...@gmail.com > >wrote: > > > Hi all, > > > > I don't think the num.retries (0.7.1) is working. Here is how I tested > it. > > > > I wrote a simple producer that sends messages with the following strings > - > > "____1_____", "_____2_____"..... . As you can see all the messages are > > sequential. > > I tailed the topic log on broker. After sending every message, I have > added > > Thread.sleep for 15 seconds. > > > > Everytime I send the message, it immediately appears in the broker log. > But > > if I restart the broker to simulate producer connection drop (in the 15 > > seconds producer sleep period), it prints the following message in the > > logs: > > > > [2012-06-28 10:31:17,127] INFO Disconnecting from localhost:9092 > > (kafka.producer.SyncProducer) > > [2012-06-28 10:31:17,132] WARN Error sending messages, 2 attempts > remaining > > (kafka.producer.async.DefaultEventHandler) > > [2012-06-28 10:31:17,132] INFO Connected to localhost:9092 for producing > > (kafka.producer.SyncProducer) > > > > But the message that was sent right after the broker restart never > reaches > > the broker. The message after that (2nd message after restart) gets to > > broker fine and the sequence continues. Thus if I restart the broker in > the > > sleep period between message 4 and 5. I don't get the message 5. I get > > message 1,2,3,4,6,7,..... > > > > I tried setting num.retries to 1 and 2 thinking that in the first retry > it > > might reconnect and the second retry is where it's resending the message. > > But that doesn't work. Number of retries doesn't improve the situation. > > > > Can you see any flaw in my testing? What can I do to better test this > > scenario? How can I ensure that no messages are dropped? I don't think I > am > > loosing the message because it's in broker memory. Please correct me if I > > am wrong. > > > > Regards, > > Vaibhav > > GumGum <http://gumgum.com> > > > > > > > > On Wed, Jun 27, 2012 at 3:42 PM, Joel Koshy <jjkosh...@gmail.com> wrote: > > > > > 0.7.1 has this: reconnect.time.interval.ms > > > > > > Thanks, > > > > > > Joel > > > > > > On Wed, Jun 27, 2012 at 3:31 PM, Vaibhav Puranik <vpura...@gmail.com> > > > wrote: > > > > > > > That will be awesome. It will definitely address AWS ELB problem. > > > > > > > > +1 for "reconnect.interval". > > > > > > > > Regards, > > > > Vaibhav > > > > GumGum > > > > > > > > > > > > On Wed, Jun 27, 2012 at 3:24 PM, Niek Sanders < > niek.sand...@gmail.com > > > > >wrote: > > > > > > > > > Do producers currently leave the sockets to the brokers open > > > > indefinitely? > > > > > > > > > > It might make sense to add a second producer config param similar > to > > > > > "reconnect.interval" which limits on time instead of message count. > > > > > (And then reconnect based on whichever criteria is hit first). For > > > > > folks going through ELBs on AWS, they'd set the > > reconnect.interval.sec > > > > > to something like 50 sec as a workaround for low-volume producers. > > > > > > > > > > - Niek > > > > > > > > > > > > > > > > > > > > On Tue, Jun 26, 2012 at 4:52 PM, Jun Rao <jun...@gmail.com> wrote: > > > > > > Set num.retries in producer config property file. It defaults to > 0. > > > > > > > > > > > > Thanks, > > > > > > > > > > > > Jun > > > > > > > > > > > > > > > > > > > > >