yanghua commented on a change in pull request #8322: [FLINK-12364] Introduce a
CheckpointFailureManager to centralized manage checkpoint failure
URL: https://github.com/apache/flink/pull/8322#discussion_r284070033
##########
File path:
flink-runtime/src/main/java/org/apache/flink/runtime/checkpoint/CheckpointCoordinator.java
##########
@@ -435,6 +439,12 @@ public boolean triggerCheckpoint(long timestamp, boolean
isPeriodic) {
triggerCheckpoint(timestamp, checkpointProperties,
null, isPeriodic, false);
return true;
} catch (CheckpointException e) {
+ try {
+ long latestGeneratedCheckpointId =
getCheckpointIdCounter().getAndIncrement();
Review comment:
> BTW, I have one more important additional point, which is the
`numUnsuccessfulCheckpointsTriggers` in checkpoint coordinator, which
absolutely sounds like something that should now be moved into the failure
manager, wdyt?
After we provided `CheckpointFailureManager`, IMO the
`numUnsuccessfulCheckpointsTriggers` is not valuable. Currently, it is
incremented in the trigger lock of method `triggerCheckpoint`. The domain of
trigger lock is the subset of the trigger phase which been
`CheckpointFailureManager` understood. So the counting is not correct. And it
is just for logging purpose. So I suggest we could remove it in this PR or in
the third step (next step). If the logging is really necessary, we could do it
again after we implemented the new counting logic based on checkpoint id
sequence.
What's your opinion?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services