smattheis commented on a change in pull request #18765:
URL: https://github.com/apache/flink/pull/18765#discussion_r817857700
##########
File path: docs/content/docs/ops/state/checkpoints.md
##########
@@ -35,6 +35,8 @@ the same semantics as a failure-free execution.
See [Checkpointing]({{< ref
"docs/dev/datastream/fault-tolerance/checkpointing" >}}) for how to enable and
configure checkpoints for your program.
+See [Checkpoint VS Savepoint]({{< ref "docs/ops/state/checkpoint_vs_savepoint"
>}}) for understanding the differences with [Savepoints]({{< ref
"docs/ops/state/savepoints" >}}).
Review comment:
```suggestion
To understand the differences between checkpoints and [savepoints]({{< ref
"docs/ops/state/savepoints" >}}) see [checkpoints vs. savepoints]({{< ref
"docs/ops/state/checkpoint_vs_savepoint" >}}).
```
##########
File path: docs/content/docs/ops/state/checkpoint_vs_savepoint.md
##########
@@ -0,0 +1,86 @@
+---
+title: "Checkpoint VS Savepoint"
+weight: 10
+type: docs
+aliases:
+ - /ops/state/checkpoint_vs_savepoint.html
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements. See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership. The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied. See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+# Checkpoint VS Savepoint
Review comment:
```suggestion
# Checkpoints vs. savepoints
## Overview
[Savepoints]({{< ref "docs/ops/state/savepoints" >}}) and [checkpoints]({{<
ref "docs/ops/state/checkpoints" >}}) are two
different ways to snapshot an application's state. The differences arise as
both address different use cases which
come with different design goals and therefore different optimizations.
The primary use case for checkpoints is to provide a recovery mechanism in
case of unexpected job failures.
The [checkpoint lifecycle]({{< ref
"docs/dev/datastream/fault-tolerance/checkpointing" >}}) is configurable but
managed by Flink, i.e., checkpoints are created, owned, and released by
Flink - without user interaction. More
specifically, checkpoints are created periodically to allow fast state
recovery at any time and are eventually deleted
when no longer needed. The main design goal is to make checkpoints as fast
as possible to be created and restored.
The optimizations, however, limit their flexibility such that applications
must not change between creation and recovery.
Checkpoints have the following characteristics:
- fast to create and restore for fast failure recovery
- limited flexibility/usability for other use cases
- managed by Flink runtime
- stored in state backend-specific (native) data format (may be incremental
depending on the specific backend)
Note: Checkpoints are automatically deleted if the application is terminated
by the user (except if checkpoints are
explicitly configured to be retained).
Although [savepoints]({{< ref "docs/ops/state/savepoints" >}}) are created
internally with the same mechanisms as
checkpoints, they are conceptually different and designed for flexibility
and portability in application maintenance
and operation. The primary use case is a planned and manual backup and
resume of an application's state.
Savepoints enable, e.g., Flink version updates, job graph changes,
configuration changes like parallelism, or forking a second job as in red/blue
deployment.
As a consequence, savepoints are slower to create and restore than
checkpoints. Also savepoints are created, owned
and deleted solely by the user. That means, Flink does not delete savepoints
neither after job termination nor after
restore.
Savepoints have the following characteristics:
- slower to create and restore than checkpoints
- support flexibility and portability for application maintenance and
operation
- managed by the user
- stored in a state backend independent (canonical) format (Note: Since
Flink 1.15, savepoints can be also stored in
the backend-specific [native]({{< ref "docs/ops/state/savepoints"
>}}#savepoint-format) format which is faster to create
and restore but comes with some limitations.)
## Capabilities and limitations
The following table gives an overview of capabilities and limitations for
the various types of savepoints and
checkpoints.
```
```
##########
File path: docs/content/docs/ops/state/checkpoint_vs_savepoint.md
##########
@@ -0,0 +1,86 @@
+---
+title: "Checkpoint VS Savepoint"
Review comment:
```suggestion
title: "Checkpoint vs. savepoints"
```
##########
File path: docs/content/docs/ops/state/savepoints.md
##########
@@ -37,14 +37,7 @@ image. The meta data file of a Savepoint contains
(primarily) pointers to all fi
In order to allow upgrades between programs and Flink versions, it is
important to check out the following section about [assigning IDs to your
operators](#assigning-operator-ids).
{{< /hint >}}
-Conceptually, Flink's Savepoints are different from Checkpoints in a similar
way that backups are different from recovery logs in traditional database
systems. The primary purpose of Checkpoints is to provide a recovery mechanism
in case of
-unexpected job failures. A Checkpoint's lifecycle is managed by Flink, i.e. a
Checkpoint is created, owned, and released by Flink - without user interaction.
As a method of recovery and being periodically triggered, two main
-design goals for the Checkpoint implementation are i) being as lightweight to
create and ii) being as fast to restore from as possible. Optimizations towards
those goals can exploit certain properties, e.g. that the job code
-doesn't change between the execution attempts. Checkpoints are usually dropped
after the job was terminated by the user (except if explicitly configured as
retained Checkpoints).
-
-In contrast to all this, Savepoints are created, owned, and deleted by the
user. Their use-case is for planned, manual backup and resume. For example,
this could be an update of your Flink version, changing your job graph,
-changing parallelism, forking a second job like for a red/blue deployment, and
so on. Of course, Savepoints must survive job termination. Conceptually,
Savepoints can be a bit more expensive to produce and restore and focus
-more on portability and support for the previously mentioned changes to the
job.
+See [Checkpoint VS Savepoint]({{< ref "docs/ops/state/checkpoint_vs_savepoint"
>}}) for understanding the differences with [Checkpoints]({{< ref
"docs/ops/state/checkpoints" >}}).
Review comment:
```suggestion
To make proper use of savepoints, it's important to understand the
differences between [checkpoints]({{< ref "docs/ops/state/checkpoints" >}}) and
savepoints which is described in [checkpoints vs. savepoints]({{< ref
"docs/ops/state/checkpoint_vs_savepoint" >}}).
```
##########
File path: docs/content/docs/concepts/stateful-stream-processing.md
##########
@@ -311,7 +311,7 @@ mechanism for this.
Savepoints are similar to checkpoints except that they are
**triggered by the user** and **don't automatically expire** when newer
-checkpoints are completed.
+checkpoints are completed(see [Checkpoint VS Savepoint]({{< ref
"docs/ops/state/checkpoint_vs_savepoint" >}}) for all differences).
Review comment:
```suggestion
checkpoints are completed. To make proper use of savepoints, it's important
to understand the differences between [checkpoints]({{< ref
"docs/ops/state/checkpoints" >}}) and [savepoints]({{< ref
"docs/ops/state/savepoints" >}}) which is described in [checkpoints vs.
savepoints]({{< ref "docs/ops/state/checkpoint_vs_savepoint" >}}).
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]