akalash commented on a change in pull request #15728:
URL: https://github.com/apache/flink/pull/15728#discussion_r621160215
##########
File path:
flink-runtime/src/main/java/org/apache/flink/runtime/checkpoint/DefaultCheckpointPlanCalculator.java
##########
@@ -111,7 +112,10 @@ public void setAllowCheckpointsAfterTasksFinished(boolean
allowCheckpointsAfterT
? calculateAfterTasksFinished()
: calculateWithAllTasksRunning();
- checkTasksStarted(result.getTasksToTrigger());
+ checkTasksStarted(
+ isUnalignedCheckpoint
+ ? result.getTasksToWaitFor()
+ : result.getTasksToTrigger());
Review comment:
@rkhachatryan
> checkpoint timeout must be less than recovery time
Do you mean 'checkpoint timeout must be **greater** than recovery time'? If
I understand checkpoint timeout correctly, then the checkpoint will be
discarded and it will start from beginning if checkpoint timeout is less than
recovery time.
> one alternative would be to add an option to allow power-users to trigger
checkpoints while still in recovery, but don't trigger by default.
I don't think that a having an extra option for power-user is a good choice.
Because as you mentioned by yourself it is narrow use-case and actually I don't
see the significant advantages in this aproach. So supporting of new feature
for pretty limited scenarios and with no obvious explanation doesn't seem good.
I see only a couple of advantages of starting the checkpoint ASAP(Arvid has
mentioned about it already):
- Sometimes(depends on bottle neck in recovery) the first checkpoint can
finish earlier.
- The first checkpoint can do its work before job really started which can
influent processing of data in less degree.
In my opinion, the only first point makes sense. It is better to have the
first checkpoint sooner rather than later. But I still don't understand how it
is important because for UC which we want to have as primal one, we don't
support such behaviour. So my position is to have the same behaviour for both
checkpoints. If we think that the delay in starting of the first checkpoint is
crucial then we should support it for UC(maybe not in this ticket but in
general). but if we think that it is not so important then we can remove this
support from AC.
So in general, it seems to have this feature can be helpful but perhaps, we
should think about having the same feature for UC.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]