infoverload commented on code in PR #516:
URL: https://github.com/apache/flink-web/pull/516#discussion_r864976179


##########
_posts/2022-04-01-tidying-snapshots-up.md:
##########
@@ -0,0 +1,213 @@
+---
+layout: post 
+title:  "Tidying up snapshots"
+date: 2022-04-01T00:00:00.000Z 
+authors:
+- dwysakowicz:
+  name: "Dawid Wysakowicz"
+  twitter: "dwysakowicz"
+
+excerpt: TODO
+
+---
+
+{% toc %}
+
+Over the years, Flink has become a well established project in the data 
streaming domain and a
+mature project requires a slight shift of priorities from thinking purely 
about new features 
+towards caring more about stability and operational simplicity. The Flink 
community has tried to address
+some known friction points over the last couple of releases, which includes 
improvements to the
+snapshotting process.
+
+Flink 1.13 was the first release we announced [unaligned 
checkpoints]({{site.DOCS_BASE_URL}}flink-docs-release-1.15/docs/concepts/stateful-stream-processing/#unaligned-checkpointing)
 to be production-ready and
+encourage people to use them if their jobs are backpressured to a point where 
it causes issues for
+checkpoints. It was also the release where we [unified the binary format of 
savepoints](/news/2021/05/03/release-1.13.0.html#switching-state-backend-with-savepoints)
 across all
+different state backends, which allows for stateful switching of those. More 
on that a bit later.
+
+The next release, 1.14 also brought additional improvements. As an alternative 
and as a complement
+to unaligned checkpoints we introduced a feature, we called ["buffer 
debloating"](/news/2021/09/29/release-1.14.0.html#buffer-debloating). It is 
build
+around the concept of automatically adjusting the amount of in-flight data 
that needs to be aligned
+while snapshotting. Another long-standing problem, we fixed, was that from 
1.14 onwards it is
+possible to [continue checkpointing even if there are finished 
tasks](/news/2021/09/29/release-1.14.0.html#checkpointing-and-bounded-streams) 
in ones jobgraph.
+
+The latest 1.15 release is no different, that we still want to pay attention 
to what makes it hard
+to operate Flink's cluster. In that release we tackled the problem that 
savepoints can be expensive
+to take and restore from if taken for a very large state stored in the RocksDB 
state backend. In
+order to circumvent the issue we had seen users leveraging the externalized, 
incremental checkpoints
+instead of savepoints in order to benefit from the native RocksDB format. To 
make it more
+straightforward, we incorporated that approach and made it possible to take 
savepoints in that
+native state backend specific format, while still maintaining some savepoints 
characteristics, which
+makes it possible to relocate such a savepoint.
+
+Another issue we've seen with externalized checkpoints is that it has not been 
clear who owns the
+checkpoint files. This is especially problematic when it comes to incremental 
RocksDB checkpoints
+where you can easily end up in a situation you do not know which checkpoints 
depend on which files
+and thus not being able to clean those files up. To solve this issue we added 
explicit restore
+modes:
+CLAIM, NO_CLAIM and LEGACY (for backwards compatibility) which clearly define 
if Flink should take
+care of cleaning up the snapshot or should it remain in users responsibility.
+
+# Restore mode
+
+The `Restore Mode` determines who takes ownership of the files that make up a 
savepoint or an
+externalized checkpoints after restoring it. Snapshots, which are either 
checkpoints or savepoints
+in this context, can be owned either by a user or Flink itself. If a snapshot 
is owned by a user,
+Flink will not delete its files. What is even more important, Flink can not 
depend on the existence
+of the files from such a snapshot, as it might be deleted outside of Flink's 
control.
+
+To begin with let's see how did it look so far and what problems it may pose. 
We left the old
+behaviour which is available if you pick the `LEGACY` mode.
+
+## LEGACY mode
+
+The legacy is mode is how Flink worked until 1.15. In this mode Flink will 
never delete the initial
+checkpoint. Unfortunately, at the same time, it is not clear if a user can 
ever delete it as well. 
+The problem here, is that Flink might immediately build an incremental 
checkpoint on top of the
+restored one. Therefore, subsequent checkpoints depend on the restored 
checkpoint. Overall, the
+ownership is not well-defined in this mode.
+
+<div style="text-align: center">
+  <img src="{{ site.baseurl 
}}/img/blog/2022-04-xx-tidying-snapshots/restore-mode-legacy.svg" alt="LEGACY 
restore mode" width="70%">
+</div>
+
+To fix the issue of a files that no one can reliably claim ownership we 
introduced the `NO_CLAIM`
+mode as the new default.
+
+## NO_CLAIM (default) mode
+
+In the *NO_CLAIM* mode Flink will not assume ownership of the snapshot. It 
will leave the files in
+user's control and never delete any of the files. In this mode you can start 
multiple jobs from the
+same snapshot.
+
+In order to make sure Flink does not depend on any of the files from that 
snapshot, it will force
+the first (successful) checkpoint to be a full checkpoint as opposed to an 
incremental one. This
+only makes a difference for `state.backend: rocksdb`, because all other state 
backends always take
+full checkpoints.
+
+Once the first full checkpoint completes, all subsequent checkpoints will be 
taken as
+usual/configured. Consequently, once a checkpoint succeeds you can manually 
delete the original
+snapshot. You can not do this earlier, because without any completed 
checkpoints Flink will - upon
+failure - try to recover from the initial snapshot.
+
+<div style="text-align: center">
+  <img src="{{ site.baseurl 
}}/img/blog/2022-04-xx-tidying-snapshots/restore-mode-no_claim.svg" 
alt="NO_CLAIM restore mode" width="70%" >
+</div>
+
+If you do not want to sacrifice any performance while taking the first 
checkpoint, we suggest
+looking into the `CLAIM` mode.

Review Comment:
   ```suggestion
   ### NO_CLAIM (default) mode
   
   To fix the issue of files that no one can reliably claim ownership of, we 
introduced the `NO_CLAIM`
   mode as the new default. In this mode, Flink will not assume ownership of 
the snapshot and will leave the files in
   the user's control and never delete any of the files.  You can start 
multiple jobs from the
   same snapshot in this mode.
   
   In order to make sure Flink does not depend on any of the files from that 
snapshot, it will force
   the first (successful) checkpoint to be a full checkpoint as opposed to an 
incremental one. This
   only makes a difference for `state.backend: rocksdb`, because all other 
state backends always take
   full checkpoints.
   
   Once the first full checkpoint completes, all subsequent checkpoints will be 
taken as
   usual/configured. Consequently, once a checkpoint succeeds, you can manually 
delete the original
   snapshot. You can not do this earlier, because without any completed 
checkpoints, Flink will - upon
   failure - try to recover from the initial snapshot.
   
   <div style="text-align: center">
     <img src="{{ site.baseurl 
}}/img/blog/2022-04-xx-tidying-snapshots/restore-mode-no_claim.svg" 
alt="NO_CLAIM restore mode" width="70%" >
   </div>
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to