infoverload commented on a change in pull request #17595:
URL: https://github.com/apache/flink/pull/17595#discussion_r738525704
##########
File path: docs/content/docs/ops/state/checkpoints_backpressure.md
##########
@@ -23,7 +24,40 @@ KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->
-# Unaligned checkpoints
+# Checkpointing under backpressure
+
+Normally aligned checkpointing time is dominated by the synchronous and
asynchronous parts of the
+checkpointing process. However, when Flink job is running under a heavy
backpressure, the dominant
+factor in the end to end time of a checkpoint can be the time to propagate
checkpoint barriers to
+all operators/subtasks (why this is the case is explained in the overview of
the
+[checkpointing process]({{< ref "docs/concepts/stateful-stream-processing"
>}}#checkpointing)).
+This can be observed by high
+[alignment time and start delay metrics]({{< ref
"docs/ops/monitoring/checkpoint_monitoring" >}}#history-tab).
+When this happens and becomes an issue there are basically three ways to
address this problem:
+1. Remove the source of the backpressure, by either optimising the Flink job,
adjusting Flink or JVM configuration or simply by scaling up.
+2. Reduce an amount of the buffered in-flight data in the Flink job.
+3. Enable unaligned checkpoints.
+
+Note that those options are not mutually exclusive, and you can combine them
together. This document
+focuses on the latter two options.
+
+## Buffer debloating
+
+Flink 1.14 introduced a new tool to automatically control an amount of the
buffered in-flight data
+between Flink operators/subtasks. The buffer debloating mechanism can be
enabled by setting the property
+`taskmanager.network.memory.buffer-debloat.enabled` to `true`. How does it
work and how to configure
+it is described in more details in the [network memory tuning guide]({{< ref
"docs/deployment/memory/network_mem_tuning"
>}}#the-buffer-debloating-mechanism).
+
+This feature works both with aligned and unaligned checkpoints and can improve
checkpointing times
+in both cases, but the effect of the debloating is most easily visible with
aligned checkpoints.
+When using buffer debloating with unaligned checkpoints, the added benefit
will be smaller checkpoints
+size and quicker recovery times (there will be less in-flight data to persist
and recover).
+
+Keep in mind that you can also manually reduce the amount of the buffered
in-flight data. How to do
+just that is also described in the aforementioned
+[network memory tuning guide]({{< ref
"docs/deployment/memory/network_mem_tuning" >}}).
Review comment:
```suggestion
Flink 1.14 introduced a new tool to automatically control the amount of
buffered in-flight data
between Flink operators/subtasks. The buffer debloating mechanism can be
enabled by setting the property
`taskmanager.network.memory.buffer-debloat.enabled` to `true`.
This feature works with both aligned and unaligned checkpoints and can
improve checkpointing times
in both cases, but the effect of the debloating is most visible with aligned
checkpoints.
When using buffer debloating with unaligned checkpoints, the added benefit
will be smaller checkpoint
sizes and quicker recovery times (there will be less in-flight data to
persist and recover).
Keep in mind that you can also manually reduce the amount of buffered
in-flight data.
For more information on how the buffer debloating feature works and how to
configure it, please refer to the [network memory tuning guide]({{< ref
"docs/deployment/memory/network_mem_tuning"
>}}#the-buffer-debloating-mechanism).
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]