[ 
https://issues.apache.org/jira/browse/FLINK-5007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15726235#comment-15726235
 ] 

ASF GitHub Bot commented on FLINK-5007:
---------------------------------------

Github user uce commented on the issue:

    https://github.com/apache/flink/pull/2750
  
    In HA mode, checkpoints are not deleted on suspension. This PR won't change 
that behaviour. It only affects non-HA behaviour.
    
    Currently, the behaviour is to remove checkpoints on suspension, which is 
definitely a problem. But in non-HA mode suspension happens also for graceful 
shut down (for example when terminating a YARN session). Never deleting on 
suspend means that users who have `DELETE_ON_CANCELLATION` configured, will 
have externalized checkpoints lingering around when they shut down their non-HA 
cluster. That's why I thought it might be better to treat this the same as the 
retain/delete on cancellation configuration.
    
    Does this make sense? For HA, these setting do not apply during suspension.


> Retain externalized checkpoint on suspension
> --------------------------------------------
>
>                 Key: FLINK-5007
>                 URL: https://issues.apache.org/jira/browse/FLINK-5007
>             Project: Flink
>          Issue Type: Bug
>          Components: State Backends, Checkpointing
>            Reporter: Ufuk Celebi
>            Assignee: Ufuk Celebi
>             Fix For: 1.2.0
>
>
> Externalized checkpoints are cleaned up when the job is suspended. 
> Suspensions happen on graceful shut down (non-HA) or loss of leadership (HA).
> In case of HA, the checkpoint store does not clean up any checkpoints as they 
> might be recovered by a new leader. The only way to stop a HA job is to 
> actually cancel it. Therefore the configured clean up behaviour doesn't 
> matter.
> In case of non-HA, suspensions happen because of graceful shut down (for 
> example stopping a YARN session). In this case I would treat the clean up 
> behaviour similar to cancelling the job.
> {code}
> ExternalizedCheckpointCleanup.DELETE_ON_CANCELLATION => delete on suspension
> ExternalizedCheckpointCleanup.RETAIN_ON_CANCELLATION => retain on suspension
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to