[jira] [Created] (FLINK-8871) Checkpoint cancellation is not propagated to stop checkpointing threads on the task manager

Stefan Richter (JIRA) Mon, 05 Mar 2018 09:03:35 -0800

Stefan Richter created FLINK-8871:
-------------------------------------

             Summary: Checkpoint cancellation is not propagated to stop 
checkpointing threads on the task manager
                 Key: FLINK-8871
                 URL: https://issues.apache.org/jira/browse/FLINK-8871
             Project: Flink
          Issue Type: Bug
          Components: State Backends, Checkpointing
    Affects Versions: 1.4.1, 1.3.2, 1.5.0
            Reporter: Stefan Richter
             Fix For: 1.6.0



Flink currently lacks any form of feedback mechanism from the job manager / 
checkpoint coordinator to the tasks when it comes to failing a checkpoint. This 
means that running snapshots on the tasks are also not stopped even if their 
owning checkpoint is already cancelled. Two examples for cases where this 
applies are checkpoint timeouts and local checkpoint failures on a task 
together with a configuration that does not fail tasks on checkpoint failure. 
Notice that those running snapshots do no longer account for the maximum number 
of parallel checkpoints, because their owning checkpoint is considered as 
cancelled.

Not stopping the task's snapshot thread can lead to a problematic situation 
where the next checkpoints already started, while the abandoned checkpoint 
thread from a previous checkpoint is still lingering around running. This 
scenario can potentially cascade: many parallel checkpoints will slow down 
checkpointing and make timeouts even more likely.

 

A possible solution is introducing a {{cancelCheckpoint}} method  as 
counterpart to the {{triggerCheckpoint}} method in the task manager gateway, 
which is invoked by the checkpoint coordinator as part of cancelling the 
checkpoint.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (FLINK-8871) Checkpoint cancellation is not propagated to stop checkpointing threads on the task manager

Reply via email to