[
https://issues.apache.org/jira/browse/FLINK-18675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17170497#comment-17170497
]
Congxian Qiu(klion26) commented on FLINK-18675:
-----------------------------------------------
[~raviratnakar] I think the problem here is that {{CheckpointRequestDecider}}
has a wrong value of {{lastCheckpointCompletionRelativeTime}} when checking
whether the checkpoint request is too early.
1. We retrieve the value of {{lastCheckpointCompletionRelativeTime}} when
calling {{CheckpointRequestDecider#chooseRequestToExecute}} in
{{CheckpointCoordinator#triggerCheckpoint}}
2. A pending checkpoint complete, and update the valuable
{{pendingCheckpoints}} and {{lastCheckpointCompletionRelativeTime}}
3. In {{CheckpointRequestDecider#chooseRequestToExecute}} we use the previous
{{lastCheckpointCompletionRelativeTime}} to check whether current checkpoint
request is too early
I think we can get the value of {{lastCheckpointCompletionRelativeTime}} in
{{CheckpointRequestDecider#chooseRequestToExecute}} here to solve the problem
here.
> Checkpoint not maintaining minimum pause duration between checkpoints
> ---------------------------------------------------------------------
>
> Key: FLINK-18675
> URL: https://issues.apache.org/jira/browse/FLINK-18675
> Project: Flink
> Issue Type: Bug
> Components: Runtime / Checkpointing
> Affects Versions: 1.11.0
> Environment: !image.png!
> Reporter: Ravi Bhushan Ratnakar
> Priority: Critical
> Attachments: image.png
>
>
> I am running a streaming job with Flink 1.11.0 using kubernetes
> infrastructure. I have configured checkpoint configuration like below
> Interval - 3 minutes
> Minimum pause between checkpoints - 3 minutes
> Checkpoint timeout - 10 minutes
> Checkpointing Mode - Exactly Once
> Number of Concurrent Checkpoint - 1
>
> Other configs
> Time Characteristics - Processing Time
>
> I am observing an usual behaviour. *When a checkpoint completes successfully*
> *and if it's end to end duration is almost equal or greater than Minimum
> pause duration then the next checkpoint gets triggered immediately without
> maintaining the Minimum pause duration*. Kindly notice this behaviour from
> checkpoint id 194 onward in the attached screenshot
--
This message was sent by Atlassian Jira
(v8.3.4#803005)