akalash commented on a change in pull request #17953:
URL: https://github.com/apache/flink/pull/17953#discussion_r768548266
##########
File path: docs/content.zh/docs/deployment/memory/network_mem_tuning.md
##########
@@ -24,147 +24,141 @@ specific language governing permissions and limitations
under the License.
-->
-# Network memory tuning guide
+# 网络内存调优指南
-## Overview
+## 概述
-Each record in Flink is sent to the next subtask compounded with other records
in a *network buffer*,
-the smallest unit for communication between subtasks. In order to maintain
consistent high throughput,
-Flink uses *network buffer queues* (also known as *in-flight data*) on the
input and output side of the transmission process.
+Flink 中每条消息都会被放到 *network buffer* 中,并以此为最小单位发送到下一个subtask。
+为了维持连续的高吞吐,Flink 在传输过程的输入端和输出端使用了 *network buffer 队列* (也被称为 *in-flight 数据*)。
-Each subtask has an input queue waiting to consume data and an output queue
-waiting to send data to the next subtask. Having a larger amount of in-flight
data means that Flink can provide higher and more resilient throughput in the
pipeline. This will, however, cause longer checkpoint times.
+每个 subtask 都有一个输入队列来接收数据和一个输出队列来发送数据到下一个subtask。
+在 pipeline 场景,拥有更多的 in-flight 数据可以使 Flink 提供更高、更富有弹性的吞吐量,但是也会增加快照时间。
-Checkpoints in Flink can only finish once all the subtasks receive all of the
injected checkpoint
-barriers. In [aligned checkpoints]({{< ref
"docs/concepts/stateful-stream-processing" >}}#checkpointing), those checkpoint
barriers are traveling throughout the job graph along with
-the network buffers. The larger the amount of in-flight data, the longer the
checkpoint barrier propagation time. In [unaligned checkpoints]({{< ref
"docs/concepts/stateful-stream-processing" >}}#unaligned-checkpointing), the
larger the amount of in-flight data, the larger the checkpoint size will be
because all of the captured in-flight data has to be persisted as part of the
checkpoint.
+只有所有的subtask都收到了全部注入的 checkpoint barrier 才能完成快照。
+在 [aligned checkpoints]({{< ref "docs/concepts/stateful-stream-processing"
>}}#checkpointing)中,checkpoint barrier 会跟着 network buffer 在 job graph中流动。
+in-flight 数据越多,checkpoint barrier 推进的时间就越长。在[unaligned checkpoints]({{< ref
"docs/concepts/stateful-stream-processing"
>}}#unaligned-checkpointing)中,in-flight 数据越多,checkpoint就会越大,因为这些数据都会被持久化到
checkpoint 中。
-## The Buffer Debloating Mechanism
+## Buffer Debloating 机制
-Previously, the only way to configure the amount of in-flight data was to
specify both the amount and the buffer size. However, ideal values can be
difficult to choose since they are different for every
-deployment. The buffer debloating mechanism added in Flink 1.14 attempts to
address this issue by automatically adjusting the amount of in-flight data to
reasonable values.
+之前,配置 in-flight 数据量的唯一方法是指定数量和 buffer 大小(size)。然而,因为每次部署的不同很难配置一组完美的参数。
+Flink 1.14 新引入的 buffer debloating 机制尝试通过自动调整 in-flight 数据量到一个合理值来解决这个问题。
-The buffer debloating feature calculates the maximum possible throughput for
the subtask (in the scenario that it is always busy) and adjusts the amount of
in-flight data such that the consumption time of those in-flight data will be
equal to the configured value.
+buffer debloating 功能计算 subtask 的最大可能吞吐(通常是繁忙场景下)并且通过调整 in-flight
数据量来使得数据的消费时间达到配置值。
-The buffer debloat mechanism can be enabled by setting the property
`taskmanager.network.memory.buffer-debloat.enabled` to `true`.
-The targeted time to consume the in-flight data can be configured by setting
`taskmanager.network.memory.buffer-debloat.target` to `duration`.
-The default value of the debloat target should be good enough for most cases.
+可以通过设置 `taskmanager.network.memory.buffer-debloat.enabled` 为 `true` 来开启 buffer
debloat 机制。
+通过设置 `taskmanager.network.memory.buffer-debloat.target` 为 `duration` 来指定消费
in-flight 数据的目标时间。
+默认值应该能满足大多数场景。
-This feature uses past throughout data to predict the time required to consume
the remaining
-in-flight data. If the predictions are incorrect, the debloating mechanism can
fail in one of two ways:
-* There will not be enough buffered data to provide full throughput.
-* There will be too many buffered in-flight data which will negatively affect
the aligned checkpoint barriers propagation time or the unaligned checkpoint
size.
+这个功能使用过去的吞吐数据来预测消费剩余 in-flight 数据的时间。如果预测不准,debloating机制会导致以下失败:
+* 没有足够的缓存数据来提供全量吞吐。
+* 有太多缓存的 in-flight 数据对 checkpoint barriers 推进或者 unaligned checkpoint 造成不良影响。
-If you have a varying load in your Job (i.e. sudden spikes of incoming
records, periodically
-firing windowed aggregations or joins), you might need to adjust the following
settings:
+如果您的作业负载经常变化(比如,突如其来的数据尖峰,定期的窗口聚合触发或者 joins ),您可能需要调整以下设置:
-* `taskmanager.network.memory.buffer-debloat.period` - This is the minimum
time period between buffer size recalculation. The shorter the period, the
faster the reaction time of the debloating mechanism but the higher the CPU
overhead for the necessary calculations.
+* `taskmanager.network.memory.buffer-debloat.period` - 这是 buffer
大小重算的最小时间周期。周期越小,debloating 机制的反应时间就越快,但是必要的计算会消耗更多的CPU。
-* `taskmanager.network.memory.buffer-debloat.samples` - This adjusts the
number of samples over which throughput measurements are averaged out. The
frequency of the collected samples can be adjusted via
`taskmanager.network.memory.buffer-debloat.period`. The fewer the samples, the
faster the reaction time of the debloating mechanism, but a higher chance of a
sudden spike or drop of the throughput which can cause the buffer debloating
mechanism to miscalculate the optimal amount of in-flight data.
+* `taskmanager.network.memory.buffer-debloat.samples` -
调整采样个数从而计算吞吐的平均值。采集样本的频率可以通过 `taskmanager.network.memory.buffer-debloat.period`
来设置。样本数越少,debloating 机制的反应时间就越快,但是当吞吐量突然飙升或者下降时,debloating 机制计算的最佳 in-flight
数据会更容易出错。
-* `taskmanager.network.memory.buffer-debloat.threshold-percentages` - An
optimization for preventing frequent buffer size changes (i.e. if the new size
is not much different compared to the old size).
+* `taskmanager.network.memory.buffer-debloat.threshold-percentages` - 防止
buffer 大小频繁改变的优化(比如,新的大小跟旧的大小相差不大)。
-Consult the [configuration]({{< ref "docs/deployment/config"
>}}#full-taskmanageroptions) documentation for more details and additional
parameters.
+更多详细和额外的参数配置,请参考 [configuration]({{< ref "docs/deployment/config"
>}}#full-taskmanageroptions)。
-Here are [metrics]({{< ref "docs/ops/metrics" >}}#io) you can use to monitor
the current buffer size:
-* `estimatedTimeToConsumeBuffersMs` - total time to consume data from all
input channels
-* `debloatedBufferSize` - current buffer size
+您可以使用以下 [metrics]({{< ref "docs/ops/metrics" >}}#io) 来监控当前的 buffer 大小:
+* `estimatedTimeToConsumeBuffersMs` - 消费所有输入通道(input channel)中数据的总时间。
+* `debloatedBufferSize` - 当前的 buffer 大小。
-### Limitations
+### 限制
-Currently, there are a few cases that are not handled automatically by the
buffer debloating mechanism.
+当前,有一些场景还没有自动地被 debloating 机制处理。
-#### Large records
+#### 大消息
-If your record size exceeds the [minimum memory segment size]({{< ref
"docs/deployment/config" >}}#taskmanager-memory-min-segment-size), buffer
debloating can potentially shrink the buffer size so much, that the network
stack will require more than one buffer to transfer a single record. This can
have adverse effects on the throughput, without actually reducing the amount of
in-flight data.
+如果您的消息超过了[最小 memory segment 长度]({{< ref "docs/deployment/config"
>}}#taskmanager-memory-min-segment-size),buffer debloating 可能会极大的减少 buffer
大小,从而导致网络栈需要更多的 buffer 去传输一条消息。在实际上没有减少 in-flight 数据量的情况下,这可能对吞吐产生不利影响。
-#### Multiple inputs and unions
+#### 多个输入和合并
-Currently, the throughput calculation and buffer debloating happen on the
subtask level.
+当前,吞吐计算和 buffer debloating 发生在 subtask 层面。
-If your subtask has multiple different inputs or it has a single but unioned
input, buffer debloating can cause the input of the low throughput to have too
much buffered in-flight data, while the input of the high throughput might have
buffers that are too small to sustain that throughput. This might be
particularly visible if the different inputs have vastly different throughputs.
We recommend paying special attention to such subtasks when testing this
feature.
+如果您的 subtask 有很多不同的输入或者有一个 unioned 的输入,buffer debloating 可能会导致低吞吐的输入有太多缓存的
in-flight 数据,而高吞吐输入的 buffers
可能太少而不够维持当前吞吐。当不同的输入吞吐差别比较大时,这种现象会更加的明显。我们推荐您在测试这个功能时重点关注这种 subtasks。
-#### Buffer size and number of buffers
+#### Buffer 尺寸和个数
-Currently, buffer debloating only caps at the maximal used buffer size. The
actual buffer size and the number of buffers remain unchanged. This means that
the debloating mechanism cannot reduce the memory usage of your job. You would
have to manually reduce either the amount or the size of the buffers.
+当前,buffer debloating 仅在使用的 buffer 大小上设置上限。实际的 buffer 大小和 buffer 个数保持不变。这意味着
debloating 机制不会减少作业的内存使用。您应该手动减少 buffer 的大小或者个数。
-Furthermore, if you want to reduce the amount of buffered in-flight data below
what buffer debloating currently allows, you might want to manually configure
the number of buffers.
+此外,如果您想减少缓存的 in-flight 数据量使其低于 buffer debloating 当前允许的量,您可能需要手动的设置 buffer 的个数。
-## Network buffer lifecycle
+## Network buffer 生命周期
-Flink has several local buffer pools - one for the output stream and one for
each input gate.
-Each of those pools is limited to at most
+Flink 有多个本地 buffer 池 —— 输出数据对应一个,每个 input gate 对应一个。
`#channels * taskmanager.network.memory.buffers-per-channel +
taskmanager.network.memory.floating-buffers-per-gate`
-The size of the buffer can be configured by setting
`taskmanager.memory.segment-size`.
+buffer 的大小可以通过 `taskmanager.memory.segment-size` 来设置。
-### Input network buffers
+### 输入 network buffers
-Buffers in the input channel are divided into exclusive and floating buffers.
Exclusive buffers can be used by only one particular channel. A channel can
request additional floating buffers from a buffer pool shared across all
channels belonging to the given input gate. The remaining floating buffers are
optional and are acquired only if there are enough resources available.
+输入通道中的 buffer 被分为独占 buffer 和流动 buffer。每个独占 buffer 只能被一个特定的通道使用。
+一个通道可以从 input gate 的共享 buffer 池中申请额外的流动 buffer。剩余的流动 buffer
是可选的并且只有资源足够的时候才能获取。
-In the initialization phase:
-- Flink will try to acquire the configured amount of exclusive buffers for
each channel
-- all exclusive buffers must be fulfilled or the job will fail with an
exception
-- a single floating buffer has to be allocated for Flink to be able to make
progress
+在初始阶段:
+- Flink 会为每一个输入通道获取配置数量的独占 buffer。
+- 所有的独占 buffer 都必须被满足,否则作业会抛异常失败。
+- 流动的 buffer 需要被分配给 Flink 才能推动进度。
-### Output network buffers
+### 输出 network buffers
-Unlike the input buffer pool, the output buffer pool has only one type of
buffer which it shares among all subpartitions.
+不像输入 buffer 池,输出 buffer 池只有一种类型的 buffer 被所有的 subpartitions 共享。
-In order to avoid excessive data skew, the number of buffers for each
subpartition is limited by the
`taskmanager.network.memory.max-buffers-per-channel` setting.
+为了避免过多的数据倾斜,每个 subpartition 的 buffer 数量可以通过
`taskmanager.network.memory.max-buffers-per-channel` 来限制。
-Like the input buffer pool, the configured amount of exclusive buffers and
floating buffers is only treated as recommended values. If there are not enough
buffers available, Flink can make progress with only a single exclusive buffer
per output subpartition and zero floating buffers.
+跟输入 buffer 池一样,配置的独占 buffer 和流动 buffer 只被当作推荐值。如果没有足够的 buffer,每个输出
subpartition 可以只使用一个独占 buffer 而没有流动 buffer 来推动 Flink 的进度。
Review comment:
@Myracle, @xintongsong, thanks for emphasizing this. It is indeed a
mistake. I think the correct way to fix this it is exactly what you proposed -
instead of `Like the input buffer pool` it should be `Unlike the input buffer
pool`.
The main idea is that if we have at least one buffer for each subpartition
it will be enough to make a progress(It doesn't matter how many buffers we
configured) while all exclusive buffers for the input channel have to be
allocated on init and if there are less available buffers than we configured,
Flink will fail.
@Myracle, can you fix this in English version in a separate commit as well?
Or I can do it later.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]