[
https://issues.apache.org/jira/browse/FLINK-4437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15433216#comment-15433216
]
Stephan Ewen commented on FLINK-4437:
-------------------------------------
Actually, there are a few more things that can happen, for example one thread
overtaking the other and checkpoint {{n+1}} being triggered before checkpoint
{{n}}.
I wonder whether we should just simply guard the entire method with a lock, and
not release it in-between. Would be much simpler, and probably okay as well.
[~tedyu] Do you want to take this issue?
> Lock evasion around lastTriggeredCheckpoint may lead to lost updates to
> related fields
> --------------------------------------------------------------------------------------
>
> Key: FLINK-4437
> URL: https://issues.apache.org/jira/browse/FLINK-4437
> Project: Flink
> Issue Type: Bug
> Reporter: Ted Yu
>
> In CheckpointCoordinator#triggerCheckpoint():
> {code}
> // make sure the minimum interval between checkpoints has passed
> if (lastTriggeredCheckpoint + minPauseBetweenCheckpoints > timestamp)
> {
> {code}
> If two threads evaluate 'lastTriggeredCheckpoint + minPauseBetweenCheckpoints
> > timestamp' in close proximity before lastTriggeredCheckpoint is updated,
> the two threads may have an inconsistent view of "lastTriggeredCheckpoint"
> and updates to fields correlated with "lastTriggeredCheckpoint" may be lost.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)