StefanRRichter commented on a change in pull request #8322: [FLINK-12364]
Introduce a CheckpointFailureManager to centralized manage checkpoint failure
URL: https://github.com/apache/flink/pull/8322#discussion_r283750480
##########
File path:
flink-runtime/src/main/java/org/apache/flink/runtime/checkpoint/CheckpointCoordinator.java
##########
@@ -435,6 +439,12 @@ public boolean triggerCheckpoint(long timestamp, boolean
isPeriodic) {
triggerCheckpoint(timestamp, checkpointProperties,
null, isPeriodic, false);
return true;
} catch (CheckpointException e) {
+ try {
+ long latestGeneratedCheckpointId =
getCheckpointIdCounter().getAndIncrement();
Review comment:
What exactly do you mean? Because the access to the id-generator does not
happen under the lock? Then how about just rewriting the trigger checkpoing
method:
The catch looks not well designed anyways, instread you could already report
the exception to the failure manager instead of throwing, or catch and report
while still under the lock inside triggerCheckpoint.
BTW, I have one more important additional point, which is the
`numUnsuccessfulCheckpointsTriggers` in checkpoint coordinator, which
absolutely sounds like something that should now be moved into the failure
manager, wdyt?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services