[GitHub] [flink] dawidwys commented on a change in pull request #18909: [hotfix] improvements to savepoint docs

GitBox Thu, 24 Feb 2022 07:47:56 -0800


dawidwys commented on a change in pull request #18909:
URL: https://github.com/apache/flink/pull/18909#discussion_r814013924




##########
File path: docs/content/docs/ops/state/savepoints.md
##########
@@ -229,40 +227,31 @@ user's control and never delete any of the files. In this 
mode you can start mul
 same snapshot.
 
 In order to make sure Flink does not depend on any of the files from that 
snapshot,
-it will force the first, completed checkpoint after the restore to be a full 
one. What it means,
-depends on the chosen state backend. For example, it makes no difference for 
the hashmap state
-backend, as there checkpoints never share files. On the other hand, RocksDB 
state backend might need
-to either reupload or duplicate some files that were previously uploaded for 
the restored snapshot,
-making the first checkpoint after restore non-incremental. 
+it will force the first (successful) checkpoint to be a full checkpoint as 
opposed to an incremental one. 
+This only makes a difference for `state.backend: rocksdb`, because all other 
state backends always take full checkpoints.
 
-Once the first full checkpoint completes, all subsequent checkpoints will be 
taken as usual.
-Consequently, once a checkpoint succeeds it is safe to delete the original 
snapshot. We can not do
-that earlier, because if we do not have any completed checkpoints, we will try 
to fail over to the
-one we restored from.
+Once the first full checkpoint completes, all subsequent checkpoints will be 
taken as usual/configured.
+Consequently, once a checkpoint succeeds you can manually delete the original 
snapshot. You can not do
+this earlier, because without any completed checkpoints Flink will - upon 
failure - try to recover from the initial snapshot.
 
 <div style="text-align: center">
   {{< img src="/fig/restore-mode-no_claim.svg" alt="NO_CLAIM restore mode" 
width="70%" >}}
 </div>
 
 **CLAIM**
 
-The other available mode is the *CLAIM* mode. In this mode Flink will claim 
the ownership of the snapshot
-and might delete it at a certain point in time, once it becomes subsumed. 
Flink keeps around
+The other available mode is the *CLAIM* mode. In this mode Flink claims 
ownership of the snapshot
+and essentially treats it like a checkpoint: its controls the lifecycle and 
might delete it if it is not needed for recovery anymore.

Review comment:
       `its controls` -> `it controls`? `controls its lifecycle`?

##########
File path: docs/content/docs/ops/state/savepoints.md
##########
@@ -199,23 +200,20 @@ This submits a job and specifies a savepoint to resume 
from. You may give a path
 
 #### Allowing Non-Restored State
 
-By default the resume operation will try to map all state of the savepoint 
back to the program you are restoring with. If you dropped an operator, you can 
allow to skip state that cannot be mapped to the new program via 
`--allowNonRestoredState` (short: `-n`) option:
+By default, the resume operation will try to map all state of the savepoint 
back to the program you are restoring with. If you dropped an operator, you can 
allow to skip state that cannot be mapped to the new program via 
`--allowNonRestoredState` (short: `-n`) option:
 
 #### Restore mode
 
-Flink 1.15 changed and at the same time added a possibility to choose a mode 
in which savepoints or 
-[externalized checkpoints]({{< ref "docs/ops/state/checkpoints" 
>}}/#resuming-from-a-retained-checkpoint)
-can be restored. Both savepoints and externalized checkpoints behave similarly 
in that context and
-will be referenced as snapshots, unless we need to explain specific properties 
of one of the two.
+The `Restore Mode` determines who takes ownership of the files  that make up a 
Savepoint or [externalized checkpoints]({{< ref "docs/ops/state/checkpoints" 
>}}/#resuming-from-a-retained-checkpoint) after restoring it.
+Both savepoints and externalized checkpoints behave similarly in this context. 
+Here, they are just called "snapshots" unless explicitely noted otherwise.
 
-The restore mode determines who takes ownership of the files of the snapshots 
that we are restoring
-from. Snapshots can be owned either by a user or Flink itself. If a snapshot 
is owned by a user,
-Flink will not delete its files, moreover, Flink can not depend on the 
existence of the files from
-such a snapshot, as it might be deleted outside of Flink's control. 
+As mentioned, the restore mode determines who takes over ownership of the 
files of the snapshots that we are restoring from. 
+Snapshots can be owned either by a user or Flink itself. 
+If a snapshot is owned by a user,  Flink will not delete its files, moreover, 
Flink can not depend on the existence of the files from such a snapshot, as it 
might be deleted outside of Flink's control. 
 
-There is no one perfect restore mode, each comes with its own price, but they 
can serve different
-purposes. Nevertheless, we believe the default *NO_CLAIM* mode is a good 
tradeoff for most usages,
-as it provides clear ownership with a small price for the first checkpoint 
after the restore.
+Each restore mode serves a specific purposes. 

Review comment:
       `purposes` -> `purpose`

##########
File path: docs/content/docs/ops/state/savepoints.md
##########
@@ -229,40 +227,31 @@ user's control and never delete any of the files. In this 
mode you can start mul
 same snapshot.
 
 In order to make sure Flink does not depend on any of the files from that 
snapshot,
-it will force the first, completed checkpoint after the restore to be a full 
one. What it means,
-depends on the chosen state backend. For example, it makes no difference for 
the hashmap state
-backend, as there checkpoints never share files. On the other hand, RocksDB 
state backend might need
-to either reupload or duplicate some files that were previously uploaded for 
the restored snapshot,
-making the first checkpoint after restore non-incremental. 
+it will force the first (successful) checkpoint to be a full checkpoint as 
opposed to an incremental one. 
+This only makes a difference for `state.backend: rocksdb`, because all other 
state backends always take full checkpoints.
 
-Once the first full checkpoint completes, all subsequent checkpoints will be 
taken as usual.
-Consequently, once a checkpoint succeeds it is safe to delete the original 
snapshot. We can not do
-that earlier, because if we do not have any completed checkpoints, we will try 
to fail over to the
-one we restored from.
+Once the first full checkpoint completes, all subsequent checkpoints will be 
taken as usual/configured.
+Consequently, once a checkpoint succeeds you can manually delete the original 
snapshot. You can not do
+this earlier, because without any completed checkpoints Flink will - upon 
failure - try to recover from the initial snapshot.
 
 <div style="text-align: center">
   {{< img src="/fig/restore-mode-no_claim.svg" alt="NO_CLAIM restore mode" 
width="70%" >}}
 </div>
 
 **CLAIM**
 
-The other available mode is the *CLAIM* mode. In this mode Flink will claim 
the ownership of the snapshot
-and might delete it at a certain point in time, once it becomes subsumed. 
Flink keeps around
+The other available mode is the *CLAIM* mode. In this mode Flink claims 
ownership of the snapshot
+and essentially treats it like a checkpoint: its controls the lifecycle and 
might delete it if it is not needed for recovery anymore.
+Hence, it is not safe to manually delete the snapshot or to start two jobs 
from the same snapshot. 
+Flink keeps around
 a [configured number]({{< ref 
"docs/dev/datastream/fault-tolerance/checkpointing" 
>}}/#state-checkpoints-num-retained)
-of checkpoints. Moreover, the snapshot becomes owned by Flink, it is no longer 
safe to remove it
-manually. Moreover, Flink might immediately build an incremental checkpoint on 
top of the initial
-snapshot.
-
-{{< hint warning >}}
-You should not start multiple jobs from the same snapshot, if you claim it. 
One of the jobs can remove
-the snapshot, while others still depend on it.
-{{< /hint >}}
+of checkpoints.
 
 <div style="text-align: center">
   {{< img src="/fig/restore-mode-claim.svg" alt="CLAIM restore mode" 
width="70%" >}}
 </div>
 
-{{< hint warning >}}
+{{< hint info >}}

Review comment:
       tbh, I'd rather keep it as a `warning` as those are limitations, but I 
am not strong on that




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [flink] dawidwys commented on a change in pull request #18909: [hotfix] improvements to savepoint docs

Reply via email to