AHeise commented on a change in pull request #16200:
URL: https://github.com/apache/flink/pull/16200#discussion_r655578539



##########
File path: docs/content/docs/ops/state/unaligned_checkpoints.md
##########
@@ -0,0 +1,118 @@
+---
+title: "Unaligned checkpoints"
+weight: 9
+type: docs
+aliases:
+- /ops/state/unalgined_checkpoints.html
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+# Unaligned checkpoints
+
+Starting with Flink 1.11, checkpoints can be unaligned.
+[Unaligned checkpoints]({{< ref "docs/concepts/stateful-stream-processing" 
>}}#unaligned-checkpointing) 
+contain in-flight data (i.e., data stored in buffers) as part of the 
checkpoint state, allowing
+checkpoint barriers to overtake these buffers. Thus, the checkpoint duration 
becomes independent of
+the current throughput as checkpoint barriers are effectively not embedded 
into the stream of data
+anymore.
+
+You should use unaligned checkpoints if your checkpointing durations are very 
high due to
+backpressure. Then, checkpointing time becomes mostly independent of the 
end-to-end latency. Be
+aware unaligned checkpointing adds to I/O to the state backends, so you 
shouldn't use it when the
+I/O to the state backend is actually the bottleneck during checkpointing.
+
+### Alignment timeout
+
+After enabling unaligned checkpoints, you can also specify the alignment 
timeout programmatically:
+
+```java
+StreamExecutionEnvironment env = 
StreamExecutionEnvironment.getExecutionEnvironment();
+env.getCheckpointConfig().setAlignmentTimeout(Duration.ofSeconds(30));
+```
+
+or in the `flink-conf.yml` configuration file:
+
+```
+execution.checkpointing.alignment-timeout: 30 s
+```
+
+When activated, each checkpoint will still begin as an aligned checkpoint, but 
if the alignment time
+for some subtask exceeds this timeout, then the checkpoint will proceed as an 
unaligned checkpoint.
+
+## Limitations
+
+### Concurrent checkpoints
+
+Flink currently does not support concurrent unaligned checkpoints. However, 
due to the more
+predictable and shorter checkpointing times, concurrent checkpoints might not 
be needed at all.
+However, savepoints can also not happen concurrently to unaligned checkpoints, 
so they will take
+slightly longer.
+
+### Interplay with watermarks
+
+Unaligned checkpoints break with an implicit guarantee in respect to 
watermarks during recovery.
+Currently, Flink generates the watermark as the first step of recovery instead 
of storing the latest
+watermark in the operators to ease rescaling. In unaligned checkpoints, that 
means on recovery,
+**Flink generates watermarks after it restores in-flight data**. If your 
pipeline uses an
+**operator that applies the latest watermark on each record** will produce 
**different results** than for
+aligned checkpoints. If your operator depends on the latest watermark being 
always available, the
+workaround is to store the watermark in the operator state. In that case, 
watermarks should be
+stored per key group in a union state to support rescaling.
+
+### Certain connections are not checkpointed
+
+There are types of connections with properties that are impossible to keep 
with channel data stored
+in checkpoints. To preserve these characteristics and ensure no state 
corruption or unexpected
+behaviour, unaligned checkpoints are disabled for such connections. All other 
exchanges still
+perform unaligned checkpoints.

Review comment:
       @pnowojski , you do realize that we have ~10 partitioner and 3 are not 
working, right? 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to