Hi guys,
I had today a production problem with the Kafka producer (version 0.7.1),
I have web front end HTTP server that produces events using the async
kafka producer.
The design is that the kafka producer is a singleton,
I often restart that HTTP server, and most of the time I could see
booting up the Kafka producer doesn't take too long, and eventually I am
ending up with a single kafka producer...
Today something went wrong, and I could idnetify locks on my server,
using jstack I saw that all my locks are from the producer initializing,
specifically in the method: getZKBrokerInfo
Thinking my bottleneck was the Zookeeper I restarted (one after another)
its nodes, this didn't solve the problem,
Without having a plan, I restarted one of the kafka borker (I have 3) ,
magically it solved the problem....
I am trying to understand what was the problem,
could it be that when producer wakes up - it tries to connect each of
the brokers, and the broker I picked was not responding? - I have
service that monitor load on my broker machines and I had no indication
of such load, moreover I have service that check if a broker responds to
9092 and this broker responded.
If a broker doesn't respond, shouldn't the producer drop this broker,
and use only the others?
Anyhow, I am not expecting any of you to know what exactly happened to
my producer, but I am trying to understand more the producer init
process, maybe I should create a connection to the kafka broker earlier
in my app life cycle.
I am also trying to figure out why there is a synchronized code in the
producer, should I not use a single producer?
Thanks
Guy Doulberg
Data infrastructure engineer