weiqingy commented on code in PR #28484:
URL: https://github.com/apache/flink/pull/28484#discussion_r3447506481


##########
flink-runtime/src/main/java/org/apache/flink/streaming/runtime/tasks/mailbox/TaskMailboxImpl.java:
##########
@@ -247,6 +248,7 @@ private void moveUrgentMailsToBatchIfNeeded(boolean 
onlyMoveUrgentMails) {
             }
         }
 
+        final boolean shouldOnlyMoveUrgentMails = onlyMoveUrgentMails && 
!isBatchEmpty;

Review Comment:
   This boolean is where the whole fix lives, and the *why* is the non-obvious 
part: when the batch is empty on entry, you deliberately let non-urgent mails 
move in too, because otherwise a non-urgent timer mail queued behind an urgent 
barrier gets put back and stranded until the next batch — which is exactly the 
between-unaligned-checkpoint timer starvation this PR fixes. A reader six 
months out sees `onlyMoveUrgentMails && !isBatchEmpty` with no way to recover 
that reasoning, and the surrounding early-returns are all commented, so this 
one stands out by its silence. Would a short why-comment here be worth it? 
Something like this, if it helps:
   
   ```java
   // When the batch is empty, also pull non-urgent mails in (not just urgent 
ones): otherwise a
   // non-urgent mail queued behind an urgent one is put back and stranded 
until the next batch,
   // starving processing-time timers between unaligned checkpoints.
   final boolean shouldOnlyMoveUrgentMails = onlyMoveUrgentMails && 
!isBatchEmpty;
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to