Github user shixiaogang commented on a diff in the pull request:

    https://github.com/apache/flink/pull/3334#discussion_r103605788
  
    --- Diff: 
flink-runtime/src/main/java/org/apache/flink/runtime/checkpoint/CheckpointCoordinator.java
 ---
    @@ -428,6 +450,9 @@ CheckpointTriggerResult triggerCheckpoint(
                        catch (Throwable t) {
                                int numUnsuccessful = 
numUnsuccessfulCheckpointsTriggers.incrementAndGet();
                                LOG.warn("Failed to trigger checkpoint (" + 
numUnsuccessful + " consecutive failed attempts so far)", t);
    +                           if(numUnsuccessful > 
maxUnsuccessfulCheckpoints) {
    --- End diff --
    
    Here the counter records the total number of failed attempts. Since a 
streaming job is intended to run a quite long time, the number of failed 
attempts will eventually exceed the limit. We should use a different counter 
here which is reset once a pending checkpoint successfully completes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

Reply via email to