philipnee opened a new pull request, #15339: URL: https://github.com/apache/kafka/pull/15339
Adding the following rebalance metrics to the consumer: rebalance-latency-avg rebalance-latency-max rebalance-latency-total rebalance-rate-per-hour rebalance-total failed-rebalance-rate-per-hour failed-rebalance-total Due to the difference in protocol, we need to redefine when rebalance starts and ends. **Start of Rebalance:** Current: Right before sending out JoinGroup ConsumerGroup: When the client receives assignments from the HB **End of Rebalance - Successful Case:** Current: Receiving SyncGroup request after transitioning to "COMPLETING_REBALANCE" ConsumerGroup: After completing reconciliation and right before sending out "Ack" heartbeat **End of Rebalance - Failed Case:** Current: Any failure in the JoinGroup/SyncGroup response ConsumerGroup: Failure in the heartbeat Note: Afterall, we try to be consistent with the current protocol. Rebalances start and end with sending and receiving network requests. Failures in network requests signify the user failures in rebalance. And it is entirely possible to have multiple failures before having a successful one. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org