[
https://issues.apache.org/jira/browse/FLINK-4201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15387789#comment-15387789
]
ASF GitHub Bot commented on FLINK-4201:
---------------------------------------
Github user uce commented on the issue:
https://github.com/apache/flink/pull/2276
Thanks for taking a look, Stephan.
Regarding your question: *Would we interfere with such a setup when
removing checkpoints on "suspend" in "standalone" mode?*:
Yes, we would interfere, but what you describe is currently **not**
possible with Flink (that is no one can run it like that). The problem is that
recovery on the master is tightly coupled to ZooKeeper (configured via
`recovery.mode: ZOOKEEPER`). I really like your idea and agree that it should
be possible to run an HA setup like that. I will open an issue for it. Do you
think it's important to fix this for 1.1 already?
Regarding the name *standalone*:
I fully agree. We have a standalone cluster mode and standalone recovery
mode. Our standalone recovery mode (`recovery.mode: STANDALONE`) actually means
`NO_RECOVERY`. I think that's what also made you assume that what you describe
is possible, right?
> Checkpoints for jobs in non-terminal state (e.g. suspended) get deleted
> -----------------------------------------------------------------------
>
> Key: FLINK-4201
> URL: https://issues.apache.org/jira/browse/FLINK-4201
> Project: Flink
> Issue Type: Bug
> Components: State Backends, Checkpointing
> Reporter: Stefan Richter
> Assignee: Ufuk Celebi
> Priority: Blocker
>
> For example, when shutting down a Yarn session, according to the logs
> checkpoints for jobs that did not terminate are deleted. In the shutdown
> hook, removeAllCheckpoints is called and removes checkpoints that should
> still be kept.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)