[
https://issues.apache.org/jira/browse/FLINK-22088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17340068#comment-17340068
]
Yun Gao commented on FLINK-22088:
---------------------------------
Hi [~pnowojski] , I think with the current implementation the TM could not
decline the exception if the task has gone already. In this case, the
checkpoint would finally fail due to expired, and with the default failure
tolerance number it should cause one failover.
If the task has not gone yet, it would be able to be declined due to not in
RUNNING state.
> CheckpointCoordinator might not be able to abort triggering checkpoint if
> failover happens during triggering
> ------------------------------------------------------------------------------------------------------------
>
> Key: FLINK-22088
> URL: https://issues.apache.org/jira/browse/FLINK-22088
> Project: Flink
> Issue Type: Bug
> Components: Runtime / Checkpointing
> Affects Versions: 1.12.2, 1.13.0
> Reporter: Yun Gao
> Priority: Minor
>
> Currently when job failover, it would try to cancel all the pending
> checkpoint via CheckpointCoordinatorDeActivator#jobStatusChanges ->
> stopCheckpointScheduler, it would try to cancel all the pending checkpoints
> and also set periodicScheduling to false.
> If at this time there is just one checkpoint start triggering, it might
> acquire all the execution to trigger before failover and start triggering.
> ideally it should be aborted in createPendingCheckpoint->
> preCheckGlobalState. However, since the check and creating pending checkpoint
> is in two different scope, there might be cases the
> CheckpointCoordinator#stopCheckpointScheduler happens during the two lock
> scope.
> We may optimize this checking; However, since the execution would finally
> fail to trigger checkpoint, it should not affect the rightness of the job.
> Besides, even if we optimize it, there might still be cases that the
> execution trigger failed due to concurrent failover.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)