Hi guys,

I had today a production problem with the Kafka producer (version 0.7.1),

I have web front end HTTP server that produces events using the async kafka producer.
The design is that the kafka producer is a singleton,

I often restart that HTTP server, and most of the time I could see booting up the Kafka producer doesn't take too long, and eventually I am ending up with a single kafka producer...


Today something went wrong, and I could idnetify locks on my server, using jstack I saw that all my locks are from the producer initializing, specifically in the method: getZKBrokerInfo


Thinking my bottleneck was the Zookeeper I restarted (one after another) its nodes, this didn't solve the problem,


Without having a plan, I restarted one of the kafka borker (I have 3) , magically it solved the problem....

I am trying to understand what was the problem,

could it be that when producer wakes up - it tries to connect each of the brokers, and the broker I picked was not responding? - I have service that monitor load on my broker machines and I had no indication of such load, moreover I have service that check if a broker responds to 9092 and this broker responded.

If a broker doesn't respond, shouldn't the producer drop this broker, and use only the others?

Anyhow, I am not expecting any of you to know what exactly happened to my producer, but I am trying to understand more the producer init process, maybe I should create a connection to the kafka broker earlier in my app life cycle.

I am also trying to figure out why there is a synchronized code in the producer, should I not use a single producer?


Thanks
Guy Doulberg
Data infrastructure engineer





Reply via email to