[GitHub] [flink] infoverload commented on a change in pull request #17595: [FLINK-24670][docs] Restructure unaligned checkpoints docs to checkpointing under backpressure

GitBox Thu, 28 Oct 2021 08:45:57 -0700


infoverload commented on a change in pull request #17595:
URL: https://github.com/apache/flink/pull/17595#discussion_r738525704




##########
File path: docs/content/docs/ops/state/checkpoints_backpressure.md
##########
@@ -23,7 +24,40 @@ KIND, either express or implied.  See the License for the
 specific language governing permissions and limitations
 under the License.
 -->
-# Unaligned checkpoints
+# Checkpointing under backpressure
+
+Normally aligned checkpointing time is dominated by the synchronous and 
asynchronous parts of the 
+checkpointing process. However, when Flink job is running under a heavy 
backpressure, the dominant 
+factor in the end to end time of a checkpoint can be the time to propagate 
checkpoint barriers to 
+all operators/subtasks (why this is the case is explained in the overview of 
the
+[checkpointing process]({{< ref "docs/concepts/stateful-stream-processing" 
>}}#checkpointing)).
+This can be observed by high
+[alignment time and start delay metrics]({{< ref 
"docs/ops/monitoring/checkpoint_monitoring" >}}#history-tab).
+When this happens and becomes an issue there are basically three ways to 
address this problem:
+1. Remove the source of the backpressure, by either optimising the Flink job, 
adjusting Flink or JVM configuration or simply by scaling up.
+2. Reduce an amount of the buffered in-flight data in the Flink job.
+3. Enable unaligned checkpoints.
+
+Note that those options are not mutually exclusive, and you can combine them 
together. This document
+focuses on the latter two options.
+
+## Buffer debloating
+
+Flink 1.14 introduced a new tool to automatically control an amount of the 
buffered in-flight data
+between Flink operators/subtasks. The buffer debloating mechanism can be 
enabled by setting the property
+`taskmanager.network.memory.buffer-debloat.enabled` to `true`. How does it 
work and how to configure
+it is described in more details in the [network memory tuning guide]({{< ref 
"docs/deployment/memory/network_mem_tuning" 
>}}#the-buffer-debloating-mechanism).
+
+This feature works both with aligned and unaligned checkpoints and can improve 
checkpointing times
+in both cases, but the effect of the debloating is most easily visible with 
aligned checkpoints.
+When using buffer debloating with unaligned checkpoints, the added benefit 
will be smaller checkpoints
+size and quicker recovery times (there will be less in-flight data to persist 
and recover). 
+
+Keep in mind that you can also manually reduce the amount of the buffered 
in-flight data. How to do
+just that is also described in the aforementioned
+[network memory tuning guide]({{< ref 
"docs/deployment/memory/network_mem_tuning" >}}).

Review comment:
       ```suggestion
   Flink 1.14 introduced a new tool to automatically control the amount of 
buffered in-flight data
   between Flink operators/subtasks. The buffer debloating mechanism can be 
enabled by setting the property
   `taskmanager.network.memory.buffer-debloat.enabled` to `true`. 
   
   This feature works with both aligned and unaligned checkpoints and can 
improve checkpointing times
   in both cases, but the effect of the debloating is most visible with aligned 
checkpoints.
   When using buffer debloating with unaligned checkpoints, the added benefit 
will be smaller checkpoint
   sizes and quicker recovery times (there will be less in-flight data to 
persist and recover). 
   
   Keep in mind that you can also manually reduce the amount of buffered 
in-flight data.
   
   For more information on how the buffer debloating feature works and how to 
configure it, please refer to the [network memory tuning guide]({{< ref 
"docs/deployment/memory/network_mem_tuning" 
>}}#the-buffer-debloating-mechanism).
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [flink] infoverload commented on a change in pull request #17595: [FLINK-24670][docs] Restructure unaligned checkpoints docs to checkpointing under backpressure

Reply via email to