dawidwys commented on a change in pull request #18909:
URL: https://github.com/apache/flink/pull/18909#discussion_r814013924
##########
File path: docs/content/docs/ops/state/savepoints.md
##########
@@ -229,40 +227,31 @@ user's control and never delete any of the files. In this
mode you can start mul
same snapshot.
In order to make sure Flink does not depend on any of the files from that
snapshot,
-it will force the first, completed checkpoint after the restore to be a full
one. What it means,
-depends on the chosen state backend. For example, it makes no difference for
the hashmap state
-backend, as there checkpoints never share files. On the other hand, RocksDB
state backend might need
-to either reupload or duplicate some files that were previously uploaded for
the restored snapshot,
-making the first checkpoint after restore non-incremental.
+it will force the first (successful) checkpoint to be a full checkpoint as
opposed to an incremental one.
+This only makes a difference for `state.backend: rocksdb`, because all other
state backends always take full checkpoints.
-Once the first full checkpoint completes, all subsequent checkpoints will be
taken as usual.
-Consequently, once a checkpoint succeeds it is safe to delete the original
snapshot. We can not do
-that earlier, because if we do not have any completed checkpoints, we will try
to fail over to the
-one we restored from.
+Once the first full checkpoint completes, all subsequent checkpoints will be
taken as usual/configured.
+Consequently, once a checkpoint succeeds you can manually delete the original
snapshot. You can not do
+this earlier, because without any completed checkpoints Flink will - upon
failure - try to recover from the initial snapshot.
<div style="text-align: center">
{{< img src="/fig/restore-mode-no_claim.svg" alt="NO_CLAIM restore mode"
width="70%" >}}
</div>
**CLAIM**
-The other available mode is the *CLAIM* mode. In this mode Flink will claim
the ownership of the snapshot
-and might delete it at a certain point in time, once it becomes subsumed.
Flink keeps around
+The other available mode is the *CLAIM* mode. In this mode Flink claims
ownership of the snapshot
+and essentially treats it like a checkpoint: its controls the lifecycle and
might delete it if it is not needed for recovery anymore.
Review comment:
`its controls` -> `it controls`? `controls its lifecycle`?
##########
File path: docs/content/docs/ops/state/savepoints.md
##########
@@ -199,23 +200,20 @@ This submits a job and specifies a savepoint to resume
from. You may give a path
#### Allowing Non-Restored State
-By default the resume operation will try to map all state of the savepoint
back to the program you are restoring with. If you dropped an operator, you can
allow to skip state that cannot be mapped to the new program via
`--allowNonRestoredState` (short: `-n`) option:
+By default, the resume operation will try to map all state of the savepoint
back to the program you are restoring with. If you dropped an operator, you can
allow to skip state that cannot be mapped to the new program via
`--allowNonRestoredState` (short: `-n`) option:
#### Restore mode
-Flink 1.15 changed and at the same time added a possibility to choose a mode
in which savepoints or
-[externalized checkpoints]({{< ref "docs/ops/state/checkpoints"
>}}/#resuming-from-a-retained-checkpoint)
-can be restored. Both savepoints and externalized checkpoints behave similarly
in that context and
-will be referenced as snapshots, unless we need to explain specific properties
of one of the two.
+The `Restore Mode` determines who takes ownership of the files that make up a
Savepoint or [externalized checkpoints]({{< ref "docs/ops/state/checkpoints"
>}}/#resuming-from-a-retained-checkpoint) after restoring it.
+Both savepoints and externalized checkpoints behave similarly in this context.
+Here, they are just called "snapshots" unless explicitely noted otherwise.
-The restore mode determines who takes ownership of the files of the snapshots
that we are restoring
-from. Snapshots can be owned either by a user or Flink itself. If a snapshot
is owned by a user,
-Flink will not delete its files, moreover, Flink can not depend on the
existence of the files from
-such a snapshot, as it might be deleted outside of Flink's control.
+As mentioned, the restore mode determines who takes over ownership of the
files of the snapshots that we are restoring from.
+Snapshots can be owned either by a user or Flink itself.
+If a snapshot is owned by a user, Flink will not delete its files, moreover,
Flink can not depend on the existence of the files from such a snapshot, as it
might be deleted outside of Flink's control.
-There is no one perfect restore mode, each comes with its own price, but they
can serve different
-purposes. Nevertheless, we believe the default *NO_CLAIM* mode is a good
tradeoff for most usages,
-as it provides clear ownership with a small price for the first checkpoint
after the restore.
+Each restore mode serves a specific purposes.
Review comment:
`purposes` -> `purpose`
##########
File path: docs/content/docs/ops/state/savepoints.md
##########
@@ -229,40 +227,31 @@ user's control and never delete any of the files. In this
mode you can start mul
same snapshot.
In order to make sure Flink does not depend on any of the files from that
snapshot,
-it will force the first, completed checkpoint after the restore to be a full
one. What it means,
-depends on the chosen state backend. For example, it makes no difference for
the hashmap state
-backend, as there checkpoints never share files. On the other hand, RocksDB
state backend might need
-to either reupload or duplicate some files that were previously uploaded for
the restored snapshot,
-making the first checkpoint after restore non-incremental.
+it will force the first (successful) checkpoint to be a full checkpoint as
opposed to an incremental one.
+This only makes a difference for `state.backend: rocksdb`, because all other
state backends always take full checkpoints.
-Once the first full checkpoint completes, all subsequent checkpoints will be
taken as usual.
-Consequently, once a checkpoint succeeds it is safe to delete the original
snapshot. We can not do
-that earlier, because if we do not have any completed checkpoints, we will try
to fail over to the
-one we restored from.
+Once the first full checkpoint completes, all subsequent checkpoints will be
taken as usual/configured.
+Consequently, once a checkpoint succeeds you can manually delete the original
snapshot. You can not do
+this earlier, because without any completed checkpoints Flink will - upon
failure - try to recover from the initial snapshot.
<div style="text-align: center">
{{< img src="/fig/restore-mode-no_claim.svg" alt="NO_CLAIM restore mode"
width="70%" >}}
</div>
**CLAIM**
-The other available mode is the *CLAIM* mode. In this mode Flink will claim
the ownership of the snapshot
-and might delete it at a certain point in time, once it becomes subsumed.
Flink keeps around
+The other available mode is the *CLAIM* mode. In this mode Flink claims
ownership of the snapshot
+and essentially treats it like a checkpoint: its controls the lifecycle and
might delete it if it is not needed for recovery anymore.
+Hence, it is not safe to manually delete the snapshot or to start two jobs
from the same snapshot.
+Flink keeps around
a [configured number]({{< ref
"docs/dev/datastream/fault-tolerance/checkpointing"
>}}/#state-checkpoints-num-retained)
-of checkpoints. Moreover, the snapshot becomes owned by Flink, it is no longer
safe to remove it
-manually. Moreover, Flink might immediately build an incremental checkpoint on
top of the initial
-snapshot.
-
-{{< hint warning >}}
-You should not start multiple jobs from the same snapshot, if you claim it.
One of the jobs can remove
-the snapshot, while others still depend on it.
-{{< /hint >}}
+of checkpoints.
<div style="text-align: center">
{{< img src="/fig/restore-mode-claim.svg" alt="CLAIM restore mode"
width="70%" >}}
</div>
-{{< hint warning >}}
+{{< hint info >}}
Review comment:
tbh, I'd rather keep it as a `warning` as those are limitations, but I
am not strong on that
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]