[ 
https://issues.apache.org/jira/browse/FLINK-4437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15433216#comment-15433216
 ] 

Stephan Ewen commented on FLINK-4437:
-------------------------------------

Actually, there are a few more things that can happen, for example one thread 
overtaking the other and checkpoint {{n+1}} being triggered before checkpoint 
{{n}}.

I wonder whether we should just simply guard the entire method with a lock, and 
not release it in-between. Would be much simpler, and probably okay as well.

[~tedyu] Do you want to take this issue?

> Lock evasion around lastTriggeredCheckpoint may lead to lost updates to 
> related fields
> --------------------------------------------------------------------------------------
>
>                 Key: FLINK-4437
>                 URL: https://issues.apache.org/jira/browse/FLINK-4437
>             Project: Flink
>          Issue Type: Bug
>            Reporter: Ted Yu
>
> In CheckpointCoordinator#triggerCheckpoint():
> {code}
>         // make sure the minimum interval between checkpoints has passed
>         if (lastTriggeredCheckpoint + minPauseBetweenCheckpoints > timestamp) 
> {
> {code}
> If two threads evaluate 'lastTriggeredCheckpoint + minPauseBetweenCheckpoints 
> > timestamp' in close proximity before lastTriggeredCheckpoint is updated, 
> the two threads may have an inconsistent view of "lastTriggeredCheckpoint" 
> and updates to fields correlated with "lastTriggeredCheckpoint" may be lost.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to