[GitHub] [flink-web] AHeise commented on a change in pull request #387: Add blog post: From Aligned to Unaligned Checkpoints - Part 1: Checkpoints, Alignment, and Backpressure

GitBox Thu, 15 Oct 2020 02:12:28 -0700


AHeise commented on a change in pull request #387:
URL: https://github.com/apache/flink-web/pull/387#discussion_r505381910




##########
File path: _posts/2020-10-13-from-aligned-to-unaligned-checkpoints-part-1.md
##########
@@ -0,0 +1,117 @@
+---
+layout: post 
+title: "From Aligned to Unaligned Checkpoints - Part 1: Checkpoints, 
Alignment, and Backpressure" 
+date: 2020-10-13T03:00:00.000Z
+authors:
+- Arvid Heise:
+  name: "Arvid Heise"
+- Stephan Ewen:
+  name: "Stephan Ewen"
+excerpt: Apache Flink’s checkpoint-based fault tolerance mechanism is one of 
its defining features. Because of that design, Flink unifies batch and stream 
processing, can easily scale to both very small and extremely large scenarios 
and provides support for many operational features. In this post we recap the 
original checkpointing process in Flink, its core properties and issues under 
backpressure.
+---
+
+Apache Flink’s checkpoint-based fault tolerance mechanism is one of its 
defining features. Because of that design, Flink unifies batch and stream 
processing, can easily scale to both [very 
small](https://hal.inria.fr/hal-02463206/document) and [extremely 
large](https://102.alibaba.com/detail?id=35) scenarios and provides support for 
many operational features like stateful upgrades with [state 
evolution](https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/state/schema_evolution.html)
 or [roll-backs and 
time-travel](https://ci.apache.org/projects/flink/flink-docs-stable/ops/state/savepoints.html).
 
+
+Despite all these great properties, Flink's checkpointing method has an 
Achilles Heel: the speed of a completed checkpoint is determined by the speed 
at which data flows through the application. When the application 
backpressures, the processing of checkpoints is backpressured as well (Appendix 
1 recaps what is backpressure and why it can be a good thing). In such cases, 
checkpoints may take longer to complete or even time out completely.
+
+In Flink 1.11, the community introduced a first version of a new feature 
called "[unaligned 
checkpoints](https://ci.apache.org/projects/flink/flink-docs-stable/ops/state/checkpoints.html#unaligned-checkpoints)"
 that aims at solving this issue, while Flink 1.12 plans to further expand its 
functionality. In this two-series blog post, we discuss how Flink’s 
checkpointing mechanism has been modified to support unaligned checkpoints, how 
unaligned checkpoints work, and how this new mode impacts Flink users. In the 
first of the two posts, we start with a recap of the original checkpointing 
process in Flink, its core properties and issues under backpressure.
+
+
+## State in Streaming Applications
+
+Simply put, State is the information that you need to remember across events. 
Even the most trivial streaming applications are typically stateful because of 
their need to “remember” the exact position they are processing data from, in 
the form of a Kafka Partition Offset or a File Offset.

Review comment:
       Yes, good point. From academics perspective "or" is always inclusive, 
but on a more informal article, I guess it makes sense to more follow the 
verbal "or", which has a high chance of being exclusive.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [flink-web] AHeise commented on a change in pull request #387: Add blog post: From Aligned to Unaligned Checkpoints - Part 1: Checkpoints, Alignment, and Backpressure

Reply via email to