[
https://issues.apache.org/jira/browse/FLINK-12619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16858550#comment-16858550
]
Aljoscha Krettek commented on FLINK-12619:
------------------------------------------
I think there might be some misunderstanding. For the short term, my suggestion
only means that we slightly adjust how we think about new feature proposals
like FLIP-41, this feature and FLIP-6755. I mentioned incremental savepoints
only as a possible future development.
My main point is that the distinction between checkpoints and savepoints is
that the former are system controlled while the latter are user controlled and
that we should keep that distinction. The motivation for this issue and for
FLINK-6755 is to have a more light-weight alternative to savepoints. I think
the solution for that is to allow savepoints to be in various different
formats, for example the format that is nowadays used by checkpoints, which
includes incremental checkpoints on the RocksDB backend.
For the user, the difference is really just in the command they use. Previously
they did
{code}
bin/flink stop --withSavepoint hdfs:///path/to/savepoint
{code}
This issue wishes to introduce
{code}
bin/flink stop --withCheckpoint
{code}
With my suggestion it would be
{code}
bin/flink stop --withSavepoint hdfs:///path/to/savepoint --snapshotFormat
canonical|optimized|incremental|whatever
{code}
which keeps the clear distinction between checkpoints and savepoints but allows
an optimized format for the savepoint which is what users want in some cases.
For the FLIP-41 effort, this means that the new format is not a "savepoint
format" but rather a canonical (or unified) format. You could almost do a
search-and-replace in the FLIP but there are some other changes like specific
class hierarchies that are suggested in the doc. Savepoints would by default
use this format so that they are compatible between backends but users can
choose to do a savepoint in a different format.
Does this description help?
> Support TERMINATE/SUSPEND Job with Checkpoint
> ---------------------------------------------
>
> Key: FLINK-12619
> URL: https://issues.apache.org/jira/browse/FLINK-12619
> Project: Flink
> Issue Type: New Feature
> Components: Runtime / State Backends
> Reporter: Congxian Qiu(klion26)
> Assignee: Congxian Qiu(klion26)
> Priority: Major
> Labels: pull-request-available
> Time Spent: 10m
> Remaining Estimate: 0h
>
> Inspired by the idea of FLINK-11458, we propose to support terminate/suspend
> a job with checkpoint. This improvement cooperates with incremental and
> external checkpoint features, that if checkpoint is retained and this feature
> is configured, we will trigger a checkpoint before the job stops. It could
> accelarate job recovery a lot since:
> 1. No source rewinding required any more.
> 2. It's much faster than taking a savepoint since incremental checkpoint is
> enabled.
> Please note that conceptually savepoints is different from checkpoint in a
> similar way that backups are different from recovery logs in traditional
> database systems. So we suggest using this feature only for job recovery,
> while stick with FLINK-11458 for the
> upgrading/cross-cluster-job-migration/state-backend-switch cases.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)