pnowojski commented on a change in pull request #14057:
URL: https://github.com/apache/flink/pull/14057#discussion_r540854209



##########
File path: 
flink-streaming-java/src/main/java/org/apache/flink/streaming/runtime/io/AlternatingController.java
##########
@@ -193,6 +202,11 @@ private CheckpointBarrierBehaviourController 
chooseController(CheckpointBarrier
 
        private boolean canTimeout(CheckpointBarrier barrier) {
                return barrier.getCheckpointOptions().isTimeoutable() &&
-                       barrier.getCheckpointOptions().getAlignmentTimeout() < 
(System.currentTimeMillis() - barrier.getTimestamp());
+                       barrier.getId() <= lastSeenBarrier &&
+                       barrier.getCheckpointOptions().getAlignmentTimeout() * 
1_000_000 < (System.nanoTime() - firstBarrierArrivalTime);
+       }
+
+       private long getArrivalTime(CheckpointBarrier announcedBarrier) {
+               return announcedBarrier.getCheckpointOptions().isTimeoutable() 
? System.nanoTime() : Long.MAX_VALUE;

Review comment:
       > Why, could you explain?
   
   On a second thought, maybe it will partially, but not fully as well.  guess 
we still have the code, that we can timeout to UC on the last processed 
barrier? So in case of single channel:
   1. announcement is processed (first announcement will never timeout in this 
version) it won't timeout
   2. barrier will be processed, and only it can timeout
   
   So a timeout on the first network exchange will work worse. That's a bit 
problematic, especially for simple jobs, with for example just a single 
exchange. Previous version would cut the checkpointing time by half, this 
version will do worse than that, in a way that's hard to quantify for me. 
   
   There is some extreme corner case when imagine there is a heavy back 
pressure, but all CB are processed at the same time. That means announcements 
in this version wouldn't cause timeout (it would in my older proposal), and 
this version will need to wait for some CB to be processed (which can take long 
time).
   
   Active timeout would alleviate this problem though.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to