Hi Guys, I have just started to work on this ticket a little more than a week ago: https://issues.apache.org/jira/browse/KAFKA-5138
I could not reproduce it sadly, but from the logs Dustin gave and from the code it seems like this might not be just a MirrorMaker issue but a consumer one. My theory is 1) MM send failure happens because of heavy load 2) MM starts to close its producer 3) during MM shutdown and the source server starts a consumer rebalance (the consumers couldn't respond because of the heavy load) 4) heartbeat response gets delayed 5) MM producer closed, but MM gets a heartbeat response and resets the connection 6) because there is thread left in the JVM it can't shut down 7) MM hangs Maybe the order is a bit different, I couldn't prove it without reproduction. I played with the following configs under 100ms and then stress tested the source cluster with JMeter. - request.timeout.ms - replica.lag.time.max.ms - session.timeout.ms - group.min.session.timeout.ms - group.max.session.timeout.ms - heartbeat.interval.ms Could you give me some pointers how could I reproduce this issue? Thanks, Tamas