weiqingy commented on code in PR #28484:
URL: https://github.com/apache/flink/pull/28484#discussion_r3447506481
##########
flink-runtime/src/main/java/org/apache/flink/streaming/runtime/tasks/mailbox/TaskMailboxImpl.java:
##########
@@ -247,6 +248,7 @@ private void moveUrgentMailsToBatchIfNeeded(boolean
onlyMoveUrgentMails) {
}
}
+ final boolean shouldOnlyMoveUrgentMails = onlyMoveUrgentMails &&
!isBatchEmpty;
Review Comment:
This boolean is where the whole fix lives, and the *why* is the non-obvious
part: when the batch is empty on entry, you deliberately let non-urgent mails
move in too, because otherwise a non-urgent timer mail queued behind an urgent
barrier gets put back and stranded until the next batch — which is exactly the
between-unaligned-checkpoint timer starvation this PR fixes. A reader six
months out sees `onlyMoveUrgentMails && !isBatchEmpty` with no way to recover
that reasoning, and the surrounding early-returns are all commented, so this
one stands out by its silence. Would a short why-comment here be worth it?
Something like this, if it helps:
```java
// When the batch is empty, also pull non-urgent mails in (not just urgent
ones): otherwise a
// non-urgent mail queued behind an urgent one is put back and stranded
until the next batch,
// starving processing-time timers between unaligned checkpoints.
final boolean shouldOnlyMoveUrgentMails = onlyMoveUrgentMails &&
!isBatchEmpty;
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]