Hi Guys,

I have just started to work on this ticket a little more than a week ago:
https://issues.apache.org/jira/browse/KAFKA-5138

I could not reproduce it sadly, but from the logs Dustin gave and from the
code it seems like this might not be just a MirrorMaker issue but a
consumer one.

My theory is
 1) MM send failure happens because of heavy load
 2) MM starts to close its producer
 3) during MM shutdown and the source server starts a consumer rebalance
(the consumers couldn't respond because of the heavy load)
 4) heartbeat response gets delayed
 5) MM producer closed, but MM gets a heartbeat response and resets the
connection
 6) because there is thread left in the JVM it can't shut down
 7) MM hangs

Maybe the order is a bit different, I couldn't prove it without
reproduction.

I played with the following configs under 100ms and then stress tested the
source cluster with JMeter.
 - request.timeout.ms
 - replica.lag.time.max.ms
 - session.timeout.ms
 - group.min.session.timeout.ms
 - group.max.session.timeout.ms
 - heartbeat.interval.ms

Could you give me some pointers how could I reproduce this issue?

Thanks,
Tamas

Reply via email to