[jira] [Commented] (FLINK-18675) Checkpoint not maintaining minimum pause duration between checkpoints

Congxian Qiu(klion26) (Jira) Mon, 03 Aug 2020 19:11:47 -0700


    [ 
https://issues.apache.org/jira/browse/FLINK-18675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17170497#comment-17170497
 ]


Congxian Qiu(klion26) commented on FLINK-18675:
-----------------------------------------------

[~raviratnakar] I think the problem here is that {{CheckpointRequestDecider}} 
has a wrong value of {{lastCheckpointCompletionRelativeTime}} when checking 
whether the checkpoint request is too early.

1. We retrieve the value of {{lastCheckpointCompletionRelativeTime}} when 
calling {{CheckpointRequestDecider#chooseRequestToExecute}} in 
{{CheckpointCoordinator#triggerCheckpoint}}
2. A pending checkpoint complete, and update the valuable 
{{pendingCheckpoints}} and {{lastCheckpointCompletionRelativeTime}}
3. In {{CheckpointRequestDecider#chooseRequestToExecute}} we use the previous 
{{lastCheckpointCompletionRelativeTime}} to check whether current checkpoint 
request is too early

I think we can get the value of {{lastCheckpointCompletionRelativeTime}} in 
{{CheckpointRequestDecider#chooseRequestToExecute}} here to solve the problem 
here.

> Checkpoint not maintaining minimum pause duration between checkpoints
> ---------------------------------------------------------------------
>
>                 Key: FLINK-18675
>                 URL: https://issues.apache.org/jira/browse/FLINK-18675
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / Checkpointing
>    Affects Versions: 1.11.0
>         Environment: !image.png!
>            Reporter: Ravi Bhushan Ratnakar
>            Priority: Critical
>         Attachments: image.png
>
>
> I am running a streaming job with Flink 1.11.0 using kubernetes 
> infrastructure. I have configured checkpoint configuration like below
> Interval - 3 minutes
> Minimum pause between checkpoints - 3 minutes
> Checkpoint timeout - 10 minutes
> Checkpointing Mode - Exactly Once
> Number of Concurrent Checkpoint - 1
>  
> Other configs
> Time Characteristics - Processing Time
>  
> I am observing an usual behaviour. *When a checkpoint completes successfully* 
> *and if it's end to end duration is almost equal or greater than Minimum 
> pause duration then the next checkpoint gets triggered immediately without 
> maintaining the Minimum pause duration*. Kindly notice this behaviour from 
> checkpoint id 194 onward in the attached screenshot



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (FLINK-18675) Checkpoint not maintaining minimum pause duration between checkpoints

Reply via email to