XBaith commented on a change in pull request #17736:
URL: https://github.com/apache/flink/pull/17736#discussion_r752409012



##########
File path: docs/content.zh/docs/ops/state/checkpointing_under_backpressure.md
##########
@@ -26,176 +26,150 @@ under the License.
 -->
 # Checkpointing under backpressure
 
-Normally aligned checkpointing time is dominated by the synchronous and 
asynchronous parts of the 
-checkpointing process. However, when a Flink job is running under heavy 
backpressure, the dominant 
-factor in the end-to-end time of a checkpoint can be the time to propagate 
checkpoint barriers to 
-all operators/subtasks. This is explained in the overview of the
-[checkpointing process]({{< ref "docs/concepts/stateful-stream-processing" 
>}}#checkpointing)).
-and can be observed by high
-[alignment time and start delay metrics]({{< ref 
"docs/ops/monitoring/checkpoint_monitoring" >}}#history-tab).
-When this happens and becomes an issue, there are three ways to address the 
problem:
-1. Remove the backpressure source by optimizing the Flink job, by adjusting 
Flink or JVM configurations, or by scaling up.
-2. Reduce the amount of buffered in-flight data in the Flink job.
-3. Enable unaligned checkpoints.
-
-These options are not mutually exclusive and can be combined together. This 
document
-focuses on the latter two options.
-
-## Buffer debloating
-
-Flink 1.14 introduced a new tool to automatically control the amount of 
buffered in-flight data
-between Flink operators/subtasks. The buffer debloating mechanism can be 
enabled by setting the property
-`taskmanager.network.memory.buffer-debloat.enabled` to `true`. 
-
-This feature works with both aligned and unaligned checkpoints and can improve 
checkpointing times
-in both cases, but the effect of the debloating is most visible with aligned 
checkpoints.
-When using buffer debloating with unaligned checkpoints, the added benefit 
will be smaller checkpoint
-sizes and quicker recovery times (there will be less in-flight data to persist 
and recover). 
-
-For more information on how the buffer debloating feature works and how to 
configure it, please refer to the 
-[network memory tuning guide]({{< ref 
"docs/deployment/memory/network_mem_tuning" >}}).
-Keep in mind that you can also manually reduce the amount of buffered 
in-flight data which is also
-described in the aforementioned tuning guide.
-
-## Unaligned checkpoints
-
-Starting with Flink 1.11, checkpoints can be unaligned.
+通常情况下,对齐 Checkpoint 的时长受 Checkpointing 过程中的同步和异步两个部分的影响。
+然而,当 Flink 作业正运行在严重的背压情况下时,Checkpoint 端到端延迟的主要影响因子将会是传递 Checkpoint Barrier 到
+所有的算子/子任务的时间。这在 [checkpointing process]({{< ref 
"docs/concepts/stateful-stream-processing" >}}#checkpointing))
+的概述中有说明原因。并且可以通过高 [alignment time and start delay metrics]({{< ref 
"docs/ops/monitoring/checkpoint_monitoring" >}}#history-tab) 
+观察到。
+当这种情况发生并成为一个问题时,有三种方法可以解决这个问题:
+1. 通过优化 Flink 作业,调整 Flink 参数或是 JVM 参数,抑或是扩容来消除背压源头。
+2. 减少 Flink 作业中缓冲在 In-flight 数据的数据量。
+3. 启用非对齐 Checkpoints。
+这些选项并不是互斥的,并且可以组合在一起使用。本文将重点介绍后两个选项。
+
+## 缓冲区 Debloating
+
+Flink 1.14 引入了一个新的工具用于自动控制在 Flink 算子/子任务之间缓冲的 In-flight 数据的数据量。缓冲区 Debloating 机
+制可以通过将属性`taskmanager.network.memory.buffer-debloat.enabled`设置为`true`来启用。
+
+此功能对于对齐和非对齐 Checkpoint 都生效,并且在这两种情况下都能缩短 Checkpointing 的时间,但是在对齐 Checkpoint 情
+况下 Debloating 的效果最为明显。
+当在非对齐 Checkpoint 情况下使用缓冲区 Debloating 时,附加的好处是 Checkpoint 大小会更小,并且恢复时间更快 (需要保存
+和恢复的 In-flight 数据更少)。
+
+有关缓冲区 Debloating 功能如何工作以及如何配置的更多信息,可以参考 [network memory tuning guide]({{< ref 
"docs/deployment/memory/network_mem_tuning" >}})。
+请注意,您仍然可以继续使用在前面调优指南中介绍的方式来手动减少缓冲在 In-flight 数据的数据量。
+
+## 非对齐 Checkpoints
+
+从Flink 1.11开始,Checkpoint 可以是非对齐的。
 [Unaligned checkpoints]({{< ref "docs/concepts/stateful-stream-processing" 
>}}#unaligned-checkpointing) 
-contain in-flight data (i.e., data stored in buffers) as part of the 
checkpoint state, allowing
-checkpoint barriers to overtake these buffers. Thus, the checkpoint duration 
becomes independent of
-the current throughput as checkpoint barriers are effectively not embedded 
into the stream of data
-anymore.
+包含 In-flight 数据(如缓存中的数据)作为 Checkpoint State的一部分,允许 Checkpoint Barrier 
跨越这些缓冲区。因此,
+Checkpoint 时长变得与当前吞吐量无关,因为事实上 Checkpoint Barrier 已经不再嵌入在数据流之中。
 
-You should use unaligned checkpoints if your checkpointing durations are very 
high due to
-backpressure. Then, checkpointing time becomes mostly independent of the 
end-to-end latency. Be
-aware unaligned checkpointing adds to I/O to the state storage, so you 
shouldn't use it when the
-I/O to the state storage is actually the bottleneck during checkpointing.
+您应该使用非对齐 Checkpoint,如果您的 Checkpointing 由于背压导致周期非常的长。于是,Checkpointing 时间基本上就独立

Review comment:
       Maybe reverse the sentence would be better?
   ```suggestion
   如果您的 Checkpointing 由于背压导致周期非常的长,您应该使用非对齐 Checkpoint。这样,Checkpointing 时间基本上就独立
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to