I think rather than relying on sequential numbering of checkpoints, it is better we add one more signal: `CheckpointExceptionHandler.checkpointSucceeded()` where the counter is reset.
This method can be called in `AsyncCheckpointRunnable.run()`, e.g. after `reportCompletedSnapshotStates` is done: ``` owner.asynchronousCheckpointExceptionHandler.checkpointSucceeded(); // forward it to synchronousCheckpointExceptionHandler inside ``` The checkpoints finish concurrently, so I think we have to use an `AtomicInteger` for the `cpFailureCounter` and `cpFailureCounter.incrementAndGet()`. [ Full content available at: https://github.com/apache/flink/pull/6567 ] This message was relayed via gitbox.apache.org for [email protected]
