jsancio commented on code in PR #19454:
URL: https://github.com/apache/kafka/pull/19454#discussion_r2044695337


##########
metadata/src/main/java/org/apache/kafka/controller/ClusterControlManager.java:
##########
@@ -309,8 +309,10 @@ public void activate() {
         long nowNs = time.nanoseconds();
         for (BrokerRegistration registration : brokerRegistrations.values()) {
             heartbeatManager.register(registration.id(), 
registration.fenced());
-            heartbeatManager.tracker().updateContactTime(
-                new BrokerIdAndEpoch(registration.id(), registration.epoch()), 
nowNs);
+            if (!registration.fenced()) {
+                heartbeatManager.tracker().updateContactTime(
+                    new BrokerIdAndEpoch(registration.id(), 
registration.epoch()), nowNs);

Review Comment:
   Interesting. This is not a new issue but this means that a cluster with 
frequent controller failovers, more often than the heartbeat timeout, will be 
unable to fence brokers that have not sent a heartbeat.
   
   Have you considered persisting some of the session state? What is the 
scalability impact of persisting session state? For example, my understanding 
is that ZooKeeper persists session state. That is how they implement ephemeral 
nodes.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to