[ 
https://issues.apache.org/jira/browse/FLINK-6625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Rohrmann closed FLINK-6625.
--------------------------------
    Resolution: Won't Do

> Flink removes HA job data when reaching JobStatus.FAILED
> --------------------------------------------------------
>
>                 Key: FLINK-6625
>                 URL: https://issues.apache.org/jira/browse/FLINK-6625
>             Project: Flink
>          Issue Type: Improvement
>          Components: Runtime / Coordination
>    Affects Versions: 1.3.0, 1.4.0
>            Reporter: Till Rohrmann
>            Priority: Major
>
> Currently, Flink removes all job related data (submitted {{JobGraph}} as well 
> as checkpoints) when it reaches a globally terminal state (including 
> {{JobStatus.FAILED}}). In high availability mode, this entails that all data 
> is removed from ZooKeeper and there is no way to recover the job by 
> restarting the cluster with the same cluster id.
> I think this is problematic, since an application might just have failed 
> because it has depleted its numbers of restart attempts. Also the last 
> checkpoint information could be helpful when trying to find out why the job 
> has actually failed. I propose that we only remove job data when reaching the 
> state {{JobStatus.SUCCESS}} or {{JobStatus.CANCELED}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to