[ https://issues.apache.org/jira/browse/KAFKA-706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sriram Subramanian resolved KAFKA-706. -------------------------------------- Resolution: Fixed The dependent bug has been fixed > broker appears to be encoding ProduceResponse, but never sending it > ------------------------------------------------------------------- > > Key: KAFKA-706 > URL: https://issues.apache.org/jira/browse/KAFKA-706 > Project: Kafka > Issue Type: Bug > Components: core > Affects Versions: 0.8 > Environment: reproduced on both Mac OS and RH linux, via private > node.js client > Reporter: ben fleis > Assignee: Sriram Subramanian > > By all appearances, I seem to be able to convince a broker to periodically > encode, but never transmit, a ProduceResponse. Unfortunately my client is > proprietary, but I will share it with Neha via LI channels. But I will > describe what's going on in the hopes that there's another trivial way to > reproduce it. (I did search through JIRA, and haven't found anything that > looks like this.) > I am running a single instance zookeeper and single broker. I have a client > that generates configurable amounts of data, tracking what is produced (both > sent and ACK'd), and what is consumed. I was noticing that when using high > transfer rates via high frequency single messages, my unack'd queue appeared > to be getting continuously larger. So, I outfitted my client to log more > information about correlation ids at various stages, and modified the kafka > ProducerRequest/ProducerResponse to log (de)serialization of the same. I > then used tcpdump to intercept all communications between my client and the > broker. Finally, I configured my client to generate 1 message per ~10ms, > each payload being approximately 33 bytes; requestAckTimeout was set to > 2000ms, and requestAcksRequired was set to 1. I used 10ms as I found that > 5ms or less caused my unacked queue to build up due to system speed -- it > simply couldn't keep up. 10ms keeps the load high, but just manageable. > YMMV with that param. All of this is done on a single host, over loopback. > I ran it on both my airbook, and a well setup RH linux box, and found the > same problem. > At startup, my system logged "expired" requests - meaning reqs that were > sent, but for which no ACK, positive or negative, was seen from the broker, > within 1.25x the requestAckTimeout (ie, 2500ms). I would let it settle until > the unacked queue was stable at or around 0. > What I found is this: ACKs are normally generated within milliseconds. This > was demonstrated by my logging added to the scala ProducerRe* classes, and > they are normally seen quickly by my client. But when the actual error > occurs, namely that a request is ignored, the ProducerResponse class *does* > encode the correct correlationId; however, a response containing that ID is > never sent over the network, as evidenced by my tcpdump traces. In my > experience this would take anywhere from 3-15 seconds to occur after the > system was warm, meaning that it's 1 out of several hundred on average that > shows the condition. > While I can't attach my client code, I could attach logs; but since my > intention is to share the code with LI people, I will wait to see if that's > useful here. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira