akalash commented on a change in pull request #15728:
URL: https://github.com/apache/flink/pull/15728#discussion_r621160215



##########
File path: 
flink-runtime/src/main/java/org/apache/flink/runtime/checkpoint/DefaultCheckpointPlanCalculator.java
##########
@@ -111,7 +112,10 @@ public void setAllowCheckpointsAfterTasksFinished(boolean 
allowCheckpointsAfterT
                                         ? calculateAfterTasksFinished()
                                         : calculateWithAllTasksRunning();
 
-                        checkTasksStarted(result.getTasksToTrigger());
+                        checkTasksStarted(
+                                isUnalignedCheckpoint
+                                        ? result.getTasksToWaitFor()
+                                        : result.getTasksToTrigger());

Review comment:
       @rkhachatryan 
   > checkpoint timeout must be less than recovery time
   
   Do you mean 'checkpoint timeout must be **greater** than recovery time'? If 
I understand checkpoint timeout correctly, then the checkpoint will be 
discarded and it will start from beginning if checkpoint timeout is less than 
recovery time. 
   
   > one alternative would be to add an option to allow power-users to trigger 
checkpoints while still in recovery, but don't trigger by default.
   
   I don't think that a having an extra option for power-user is a good choice. 
Because as you mentioned by yourself it is narrow use-case and actually I don't 
see the significant advantages in this aproach. So supporting of new feature 
for pretty limited scenarios and with no obvious explanation doesn't seem good.
   
   
   I see only a couple of advantages of starting the checkpoint ASAP(Arvid has 
mentioned about it already):
   
   - Sometimes(depends on bottle neck in recovery) the first checkpoint can 
finish earlier.
   - The first checkpoint can do its work before job really started which can 
influent processing of data in less degree.
   
   In my opinion, the only first point makes sense. It is better to have the 
first checkpoint sooner rather than later. But I still don't understand how it 
is important because for UC which we want to have as primal one, we don't 
support such behaviour. So my position is to have the same behaviour for both 
checkpoints. If we think that the delay in starting of the first checkpoint is 
crucial then we should support it for UC(maybe not in this ticket but in 
general). but if we think that it is not so important then we can remove this 
support from AC.
   
   So in general, it seems to have this feature can be helpful but perhaps, we 
should think about having the same feature for UC.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to