philipnee opened a new pull request, #15339:
URL: https://github.com/apache/kafka/pull/15339

   Adding the following rebalance metrics to the consumer:
   
   rebalance-latency-avg
   rebalance-latency-max
   rebalance-latency-total
   rebalance-rate-per-hour
   rebalance-total
   failed-rebalance-rate-per-hour
   failed-rebalance-total
   
   Due to the difference in protocol, we need to redefine when rebalance starts 
and ends.
   **Start of Rebalance:**
   Current: Right before sending out JoinGroup
   ConsumerGroup: When the client receives assignments from the HB
   
   **End of Rebalance - Successful Case:**
   Current: Receiving SyncGroup request after transitioning to 
"COMPLETING_REBALANCE"
   ConsumerGroup: After completing reconciliation and right before sending out 
"Ack" heartbeat
   
   **End of Rebalance - Failed Case:**
   Current: Any failure in the JoinGroup/SyncGroup response
   ConsumerGroup: Failure in the heartbeat
   
   Note: Afterall, we try to be consistent with the current protocol.  
Rebalances start and end with sending and receiving network requests.  Failures 
in network requests signify the user failures in rebalance.  And it is entirely 
possible to have multiple failures before having a successful one.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to