niket-goel commented on a change in pull request #11191:
URL: https://github.com/apache/kafka/pull/11191#discussion_r685437960



##########
File path: 
metadata/src/main/java/org/apache/kafka/controller/QuorumController.java
##########
@@ -858,7 +858,7 @@ private void rescheduleMaybeFenceStaleBrokers() {
             return;
         }
         scheduleDeferredWriteEvent(MAYBE_FENCE_REPLICAS, nextCheckTimeNs, () 
-> {
-            ControllerResult<Void> result = 
replicationControl.maybeFenceStaleBrokers();
+            ControllerResult<Void> result = 
replicationControl.maybeFenceOneStaleBroker();
             rescheduleMaybeFenceStaleBrokers();

Review comment:
       I thought about doing that, but then realized that a call to reschedule 
will automatically result in a scheduling of another invocation right away if 
there were any brokers that were deemed stale still remaining in the unfenced 
list.
   
   This will happen because the `nextCheckTimeNs` is based on when the broker 
with the older `lastContactNs` time (broker at the head of the list) would time 
out. This will always result in a time less than now if we have a stale broker 
and there should be no delay in executing this again (barring queue depth).
   
   I also did not want to "prioritize" (if that's even possible) over other 
requests, because my understanding is that the fix works as long as a 
subsequent fence broker call is made after the first has been processed by the 
controller thread (which would also happen by picking up the record from the 
queue and replaying it).
   
   Please let me know if my understanding is off here.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to