[GitHub] [flink] zhuzhurk commented on a change in pull request #18757: [FLINK-25226][doc] Add documentation about the AdaptiveBatchScheduler

GitBox Thu, 17 Feb 2022 20:52:53 -0800


zhuzhurk commented on a change in pull request #18757:
URL: https://github.com/apache/flink/pull/18757#discussion_r809646656




##########
File path: docs/content.zh/docs/deployment/adaptive_batch_scheduler.md
##########
@@ -25,39 +25,43 @@ under the License.
 
 ## Adaptive Batch Scheduler
 
-Adaptive Batch Scheduler 
是一种可以自动推导每个节点并行度的调度器。如果节点未设置并行度，调度器将根据其消耗的数据量的大小来推导其并行度。这可以带来诸多好处：
+Adaptive Batch Scheduler 
是一种可以自动推导每个算子并行度的批作业处理调度器。如果算子未设置并行度，调度器将根据其消费的数据量的大小来推导其并行度。这可以带来诸多好处：
 - 批作业用户可以从并行度调优中解脱出来
 - 根据数据量自动推导并行度可以更好地适应每天变化的数据量
-- SQL作业中的节点也可以分配不同的并行性
+- SQL作业中的算子也可以分配不同的并行性

Review comment:
       并行性 -> 并行度

##########
File path: docs/content/docs/deployment/elastic_scaling.md
##########
@@ -151,5 +151,45 @@ The behavior of Adaptive Scheduler is configured by [all 
configuration options c
 - **Unused slots**: If the max parallelism for slot sharing groups is not 
equal, slots offered to Adaptive Scheduler might be unused.
 - Scaling events trigger job and task restarts, which will increase the number 
of Task attempts.
 
+## Adaptive Batch Scheduler
+
+The Adaptive Batch Scheduler can automatically decide parallelisms of 
operators for batch jobs. If an operator is not set with a parallelism, the 
scheduler will decide parallelism for it according to the size of its consumed 
datasets. This can bring many benefits:
+- Batch job users can be relieved from parallelism tuning
+- Automatically tuned parallelisms can better fit consumed datasets which have 
a varying volume size every day
+- Operators from SQL batch jobs can be assigned with different parallelisms 
which are automatically tuned
+
+### Usage
+
+To automatically decide parallelisms for operators with Adaptive Batch 
Scheduler, you need to:
+- Configure to use Adaptive Batch Scheduler.
+- Set the parallelism of operators to `-1`.
+  
+#### Configure to use Adaptive Batch Scheduler
+To use Adaptive Batch Scheduler, you need to:
+- Set the [`jobmanager.scheduler`]({{< ref "docs/deployment/config" 
>}}#jobmanager-scheduler) to `AdaptiveBatch`
+- Set the [`execution.batch-shuffle-mode`]({{< ref "docs/deployment/config" 
>}}#execution-batch-shuffle-mode) to `ALL-EXCHANGES-BLOCKING`(default value) 
due to ["ALL-EXCHANGES-BLOCKING jobs only"](#Limitations).

Review comment:
       > #Limitations
   
   Maybe use the anchor to the subsection`#all-exchanges-blocking-jobs-only`?

##########
File path: docs/content/docs/deployment/elastic_scaling.md
##########
@@ -151,5 +151,45 @@ The behavior of Adaptive Scheduler is configured by [all 
configuration options c
 - **Unused slots**: If the max parallelism for slot sharing groups is not 
equal, slots offered to Adaptive Scheduler might be unused.
 - Scaling events trigger job and task restarts, which will increase the number 
of Task attempts.
 
+## Adaptive Batch Scheduler
+
+The Adaptive Batch Scheduler can automatically decide parallelisms of 
operators for batch jobs. If an operator is not set with a parallelism, the 
scheduler will decide parallelism for it according to the size of its consumed 
datasets. This can bring many benefits:
+- Batch job users can be relieved from parallelism tuning
+- Automatically tuned parallelisms can better fit consumed datasets which have 
a varying volume size every day
+- Operators from SQL batch jobs can be assigned with different parallelisms 
which are automatically tuned
+
+### Usage
+
+To automatically decide parallelisms for operators with Adaptive Batch 
Scheduler, you need to:
+- Configure to use Adaptive Batch Scheduler.
+- Set the parallelism of operators to `-1`.
+  
+#### Configure to use Adaptive Batch Scheduler
+To use Adaptive Batch Scheduler, you need to:
+- Set the [`jobmanager.scheduler`]({{< ref "docs/deployment/config" 
>}}#jobmanager-scheduler) to `AdaptiveBatch`
+- Set the [`execution.batch-shuffle-mode`]({{< ref "docs/deployment/config" 
>}}#execution-batch-shuffle-mode) to `ALL-EXCHANGES-BLOCKING`(default value) 
due to ["ALL-EXCHANGES-BLOCKING jobs only"](#Limitations).
+
+In addition, there are several related configuration options that may need 
adjustment when using Adaptive Batch Scheduler:
+- [`jobmanager.adaptive-batch-scheduler.min-parallelism`]({{< ref 
"docs/deployment/config" 
>}}#jobmanager-adaptive-batch-scheduler-min-parallelism): The lower bound of 
allowed parallelism to set adaptively
+- [`jobmanager.adaptive-batch-scheduler.max-parallelism`]({{< ref 
"docs/deployment/config" 
>}}#jobmanager-adaptive-batch-scheduler-max-parallelism): The upper bound of 
allowed parallelism to set adaptively
+- [`jobmanager.adaptive-batch-scheduler.data-volume-per-task`]({{< ref 
"docs/deployment/config" 
>}}#jobmanager-adaptive-batch-scheduler-data-volume-per-task): The size of data 
volume to expect each task instance to process
+- [`jobmanager.adaptive-batch-scheduler.default-source-parallelism`]({{< ref 
"docs/deployment/config" 
>}}#jobmanager-adaptive-batch-scheduler-default-source-parallelism): The 
default parallelism of data source.
+
+#### Set the parallelism of operators to `-1`
+Adaptive Batch Scheduler will only decide parallelism for operators whose 
parallelism is not specified by users (parallelism is `-1`). So if you want the 
parallelism of operators to be decided automatically, you should configure as 
follows:
+- Set [`parallelism.default`]({{< ref "docs/deployment/config" 
>}}#parallelism-default) to `-1`
+- Set [`table.exec.resource.default-parallelism`]({{< ref 
"docs/deployment/config" >}}#table-exec-resource-default-parallelism) to `-1` 
in SQL jobs.
+- Don't call `setParallelism()` for operators in DataStream/DataSet jobs.
+
+### Performance tuning
+
+1. It's recommended to use [Sort 
Shuffle](https://flink.apache.org/2021/10/26/sort-shuffle-part1.html) and set 
[`taskmanager.network.memory.buffers-per-channel`]({{< ref 
"docs/deployment/config" >}}#taskmanager-network-memory-buffers-per-channel) to 
`0`. This can decouple the required network memory from parallelism, so that 
for large scale jobs, the "Insufficient number of network buffers" errors are 
less likely to happen.
+2. It's recommended to set 
[`jobmanager.adaptive-batch-scheduler.max-parallelism`]({{< ref 
"docs/deployment/config" 
>}}#jobmanager-adaptive-batch-scheduler-max-parallelism) to the parallelism you 
expect to need in the worst case. Values larger than this are not recommended, 
because excessive value may affect the performance. This option can affect the 
number of subpartitions produced by upstream tasks, large number of 
subpartitions may degrade the performance of hash shuffle and the performance 
of network transmission due to small packets.
+                                                                               
                                                                                
                                                                                
                                       
+### Limitations
+
+- **Batch jobs only**: Adaptive Batch Scheduler only supports batch jobs.
+- **ALL-EXCHANGES-BLOCKING jobs only**: At the moment, Adaptive Batch 
Scheduler only supports jobs whose [shuffle mode]({{< ref 
"docs/deployment/config" >}}#execution-batch-shuffle-mode) is 
`ALL-EXCHANGES-BLOCKING`(Upstream and downstream tasks run sequentially in such 
jobs).

Review comment:
       > (Upstream and downstream tasks run sequentially in such jobs)
   
   Maybe remove it, because the description of `execution-batch-shuffle-mode` 
already explains it in a much more detailed way.

##########
File path: docs/content/docs/deployment/elastic_scaling.md
##########
@@ -151,5 +151,45 @@ The behavior of Adaptive Scheduler is configured by [all 
configuration options c
 - **Unused slots**: If the max parallelism for slot sharing groups is not 
equal, slots offered to Adaptive Scheduler might be unused.
 - Scaling events trigger job and task restarts, which will increase the number 
of Task attempts.
 
+## Adaptive Batch Scheduler
+
+The Adaptive Batch Scheduler can automatically decide parallelisms of 
operators for batch jobs. If an operator is not set with a parallelism, the 
scheduler will decide parallelism for it according to the size of its consumed 
datasets. This can bring many benefits:
+- Batch job users can be relieved from parallelism tuning
+- Automatically tuned parallelisms can better fit consumed datasets which have 
a varying volume size every day
+- Operators from SQL batch jobs can be assigned with different parallelisms 
which are automatically tuned
+
+### Usage
+
+To automatically decide parallelisms for operators with Adaptive Batch 
Scheduler, you need to:
+- Configure to use Adaptive Batch Scheduler.
+- Set the parallelism of operators to `-1`.
+  
+#### Configure to use Adaptive Batch Scheduler
+To use Adaptive Batch Scheduler, you need to:
+- Set the [`jobmanager.scheduler`]({{< ref "docs/deployment/config" 
>}}#jobmanager-scheduler) to `AdaptiveBatch`
+- Set the [`execution.batch-shuffle-mode`]({{< ref "docs/deployment/config" 
>}}#execution-batch-shuffle-mode) to `ALL-EXCHANGES-BLOCKING`(default value) 
due to ["ALL-EXCHANGES-BLOCKING jobs only"](#Limitations).
+
+In addition, there are several related configuration options that may need 
adjustment when using Adaptive Batch Scheduler:
+- [`jobmanager.adaptive-batch-scheduler.min-parallelism`]({{< ref 
"docs/deployment/config" 
>}}#jobmanager-adaptive-batch-scheduler-min-parallelism): The lower bound of 
allowed parallelism to set adaptively
+- [`jobmanager.adaptive-batch-scheduler.max-parallelism`]({{< ref 
"docs/deployment/config" 
>}}#jobmanager-adaptive-batch-scheduler-max-parallelism): The upper bound of 
allowed parallelism to set adaptively
+- [`jobmanager.adaptive-batch-scheduler.data-volume-per-task`]({{< ref 
"docs/deployment/config" 
>}}#jobmanager-adaptive-batch-scheduler-data-volume-per-task): The size of data 
volume to expect each task instance to process
+- [`jobmanager.adaptive-batch-scheduler.default-source-parallelism`]({{< ref 
"docs/deployment/config" 
>}}#jobmanager-adaptive-batch-scheduler-default-source-parallelism): The 
default parallelism of data source.
+
+#### Set the parallelism of operators to `-1`
+Adaptive Batch Scheduler will only decide parallelism for operators whose 
parallelism is not specified by users (parallelism is `-1`). So if you want the 
parallelism of operators to be decided automatically, you should configure as 
follows:
+- Set [`parallelism.default`]({{< ref "docs/deployment/config" 
>}}#parallelism-default) to `-1`
+- Set [`table.exec.resource.default-parallelism`]({{< ref 
"docs/deployment/config" >}}#table-exec-resource-default-parallelism) to `-1` 
in SQL jobs.
+- Don't call `setParallelism()` for operators in DataStream/DataSet jobs.
+
+### Performance tuning
+
+1. It's recommended to use [Sort 
Shuffle](https://flink.apache.org/2021/10/26/sort-shuffle-part1.html) and set 
[`taskmanager.network.memory.buffers-per-channel`]({{< ref 
"docs/deployment/config" >}}#taskmanager-network-memory-buffers-per-channel) to 
`0`. This can decouple the required network memory from parallelism, so that 
for large scale jobs, the "Insufficient number of network buffers" errors are 
less likely to happen.
+2. It's recommended to set 
[`jobmanager.adaptive-batch-scheduler.max-parallelism`]({{< ref 
"docs/deployment/config" 
>}}#jobmanager-adaptive-batch-scheduler-max-parallelism) to the parallelism you 
expect to need in the worst case. Values larger than this are not recommended, 
because excessive value may affect the performance. This option can affect the 
number of subpartitions produced by upstream tasks, large number of 
subpartitions may degrade the performance of hash shuffle and the performance 
of network transmission due to small packets.
+                                                                               
                                                                                
                                                                                
                                       
+### Limitations
+
+- **Batch jobs only**: Adaptive Batch Scheduler only supports batch jobs.

Review comment:
       maybe add one line "Exception will be thrown if a streaming job is 
submitted."

##########
File path: docs/content.zh/docs/deployment/adaptive_batch_scheduler.md
##########
@@ -25,39 +25,43 @@ under the License.
 
 ## Adaptive Batch Scheduler
 
-Adaptive Batch Scheduler 
是一种可以自动推导每个节点并行度的调度器。如果节点未设置并行度，调度器将根据其消耗的数据量的大小来推导其并行度。这可以带来诸多好处：
+Adaptive Batch Scheduler 
是一种可以自动推导每个算子并行度的批作业处理调度器。如果算子未设置并行度，调度器将根据其消费的数据量的大小来推导其并行度。这可以带来诸多好处：
 - 批作业用户可以从并行度调优中解脱出来
 - 根据数据量自动推导并行度可以更好地适应每天变化的数据量
-- SQL作业中的节点也可以分配不同的并行性
+- SQL作业中的算子也可以分配不同的并行性
 
-### Usage
+### 用法
 
-使用 Adaptive Batch Scheduler 自动推导作业节点的并行度，需要：
+使用 Adaptive Batch Scheduler 自动推导算子的并行度，需要：
 - 启用 Adaptive Batch Scheduler
-- 配置节点的并行度为 `-1`
+- 配置算子的并行度为 `-1`
 
 #### 启用 Adaptive Batch Scheduler
-为了启用 Adaptive Batch Scheduler, 你需要将 [`jobmanager.scheduler`]({{< ref 
"docs/deployment/config" >}}#jobmanager-scheduler) 配置为 `AdpaptiveBatch`。除此之外，使用 
Adaptive Batch Scheduler 时，以下配置也可以选择性配置:
-- [`jobmanager.adaptive-batch-scheduler.min-parallelism`]({{< ref 
"docs/deployment/config" 
>}}#jobmanager-adaptive-batch-scheduler-min-parallelism): 允许设置的并行度最小值
-- [`jobmanager.adaptive-batch-scheduler.max-parallelism`]({{< ref 
"docs/deployment/config" 
>}}#jobmanager-adaptive-batch-scheduler-max-parallelism): 允许设置的并行度最大值
+为了启用 Adaptive Batch Scheduler, 你需要：
+- 将[`jobmanager.scheduler`]({{< ref "docs/deployment/config" 
>}}#jobmanager-scheduler) 配置为 `AdpaptiveBatch`。
+- 由于 ["ALL-EXCHANGES-BLOCKING jobs only"](#限制), 
需要将[`execution.batch-shuffle-mode`]({{< ref "docs/deployment/config" 
>}}#execution-batch-shuffle-mode) 配置为 `ALL-EXCHANGES-BLOCKING`(默认值) 。
+
+除此之外，使用 Adaptive Batch Scheduler 时，以下相关配置也可以调整:
+- [`jobmanager.adaptive-batch-scheduler.min-parallelism`]({{< ref 
"docs/deployment/config" 
>}}#jobmanager-adaptive-batch-scheduler-min-parallelism): 允许自动设置的并行度最小值
+- [`jobmanager.adaptive-batch-scheduler.max-parallelism`]({{< ref 
"docs/deployment/config" 
>}}#jobmanager-adaptive-batch-scheduler-max-parallelism): 允许自动设置的并行度最大值
 - [`jobmanager.adaptive-batch-scheduler.data-volume-per-task`]({{< ref 
"docs/deployment/config" 
>}}#jobmanager-adaptive-batch-scheduler-data-volume-per-task): 期望每个任务处理的数据量大小
-- [`jobmanager.adaptive-batch-scheduler.default-source-parallelism`]({{< ref 
"docs/deployment/config" 
>}}#jobmanager-adaptive-batch-scheduler-default-source-parallelism): source 
节点的默认并行度
+- [`jobmanager.adaptive-batch-scheduler.default-source-parallelism`]({{< ref 
"docs/deployment/config" 
>}}#jobmanager-adaptive-batch-scheduler-default-source-parallelism): source 
算子的默认并行度
 
-#### 配置节点的并行度为 `-1`
-Adaptive Batch Scheduler 只会为用户未指定并行度的作业节点（并行度为 `-1`）推导并行度。 
所以如果你想自动推导节点的并行度，需要进行以下配置：
+#### 配置算子的并行度为 `-1`
+Adaptive Batch Scheduler 只会为用户未指定并行度的算子（并行度为 `-1`）推导并行度。 
所以如果你想自动推导算子的并行度，需要进行以下配置：
 - 配置 `parallelism.default` 为 `-1`
 - 对于 SQL 作业，需要配置 `table.exec.resource.default-parallelism` 为 `-1`
-- 对于 DataStream 作业，不要在作业中通过算子的 `setParallelism()` 方法来指定并行度
+- 对于 DataStream/DataSet 作业，不要在作业中通过算子的 `setParallelism()` 方法来指定并行度
 
 ### 性能调优
 
-1. 建议使用 `Sort Shuffle` 并且设置 
[`taskmanager.network.memory.buffers-per-channel`]({{< ref 
"docs/deployment/config" >}}#taskmanager-network-memory-buffers-per-channel) 为 
`0`。 这会解耦并发与网络内存使用量，对于大规模作业，这降低了遇到 "Insufficient number of network buffers" 
错误的可能性。
-2. 不建议为 [`jobmanager.adaptive-batch-scheduler.max-parallelism`]({{< ref 
"docs/deployment/config" 
>}}#jobmanager-adaptive-batch-scheduler-max-parallelism) 
配置太大的值，否则会影响性能。因为这个选项会影响上游任务产出的 subpartition 的数量，过多的 subpartition 可能会影响 hash 
shuffle 的性能，或者由于小包影响网络传输的性能。
+1. 建议使用 [Sort 
Shuffle](https://flink.apache.org/2021/10/26/sort-shuffle-part1.html) 并且设置 
[`taskmanager.network.memory.buffers-per-channel`]({{< ref 
"docs/deployment/config" >}}#taskmanager-network-memory-buffers-per-channel) 为 
`0`。 这会解耦并发与网络内存使用量，对于大规模作业，这样可以降低遇到 "Insufficient number of network buffers" 
错误的可能性。
+2. 建议将 [`jobmanager.adaptive-batch-scheduler.max-parallelism`]({{< ref 
"docs/deployment/config" 
>}}#jobmanager-adaptive-batch-scheduler-max-parallelism) 
设置为最坏情况下预期需要的并行度。不建议配置太大的值，因为值过大可能会影响性能。这个选项会影响上游任务产出的 subpartition 的数量，过多的 
subpartition 可能会影响 hash shuffle 的性能，或者由于小包影响网络传输的性能。
 
 ### 限制
-
-- **ALL-EDGES-BLOCKING batch jobs only**: 目前 Adaptive Batch Scheduler 只支持 
ALL-EDGES-BLOCKING 的批作业。
-- **Inconsistent broadcast results metrics on WebUI**: 在使用 Adaptive Batch 
Scheduler 时，对于 broadcast 边，上游节点发送的数据量和下游节点接收的数据量可能会不相等，这在显示上会困扰用户。细节详见 
[FLIP-187](https://cwiki.apache.org/confluence/display/FLINK/FLIP-187%3A+Adaptive+Batch+Job+Scheduler)
+- **Batch jobs only**: Adaptive Batch Scheduler 只支持批作业.
+- **ALL-EXCHANGES-BLOCKING jobs only**: 目前 Adaptive Batch Scheduler 只支持 
[shuffle mode]({{< ref "docs/deployment/config" 
>}}#execution-batch-shuffle-mode) 为 ALL-EXCHANGES-BLOCKING 的作业。

Review comment:
       **ALL-EXCHANGES-BLOCKING jobs only** -> **只支持所有数据交换都为 BLOCKING 模式的作业**

##########
File path: docs/content/docs/deployment/elastic_scaling.md
##########
@@ -151,5 +151,45 @@ The behavior of Adaptive Scheduler is configured by [all 
configuration options c
 - **Unused slots**: If the max parallelism for slot sharing groups is not 
equal, slots offered to Adaptive Scheduler might be unused.
 - Scaling events trigger job and task restarts, which will increase the number 
of Task attempts.
 
+## Adaptive Batch Scheduler
+
+The Adaptive Batch Scheduler can automatically decide parallelisms of 
operators for batch jobs. If an operator is not set with a parallelism, the 
scheduler will decide parallelism for it according to the size of its consumed 
datasets. This can bring many benefits:
+- Batch job users can be relieved from parallelism tuning
+- Automatically tuned parallelisms can better fit consumed datasets which have 
a varying volume size every day
+- Operators from SQL batch jobs can be assigned with different parallelisms 
which are automatically tuned
+
+### Usage
+
+To automatically decide parallelisms for operators with Adaptive Batch 
Scheduler, you need to:
+- Configure to use Adaptive Batch Scheduler.
+- Set the parallelism of operators to `-1`.
+  
+#### Configure to use Adaptive Batch Scheduler
+To use Adaptive Batch Scheduler, you need to:
+- Set the [`jobmanager.scheduler`]({{< ref "docs/deployment/config" 
>}}#jobmanager-scheduler) to `AdaptiveBatch`
+- Set the [`execution.batch-shuffle-mode`]({{< ref "docs/deployment/config" 
>}}#execution-batch-shuffle-mode) to `ALL-EXCHANGES-BLOCKING`(default value) 
due to ["ALL-EXCHANGES-BLOCKING jobs only"](#Limitations).
+
+In addition, there are several related configuration options that may need 
adjustment when using Adaptive Batch Scheduler:
+- [`jobmanager.adaptive-batch-scheduler.min-parallelism`]({{< ref 
"docs/deployment/config" 
>}}#jobmanager-adaptive-batch-scheduler-min-parallelism): The lower bound of 
allowed parallelism to set adaptively
+- [`jobmanager.adaptive-batch-scheduler.max-parallelism`]({{< ref 
"docs/deployment/config" 
>}}#jobmanager-adaptive-batch-scheduler-max-parallelism): The upper bound of 
allowed parallelism to set adaptively
+- [`jobmanager.adaptive-batch-scheduler.data-volume-per-task`]({{< ref 
"docs/deployment/config" 
>}}#jobmanager-adaptive-batch-scheduler-data-volume-per-task): The size of data 
volume to expect each task instance to process
+- [`jobmanager.adaptive-batch-scheduler.default-source-parallelism`]({{< ref 
"docs/deployment/config" 
>}}#jobmanager-adaptive-batch-scheduler-default-source-parallelism): The 
default parallelism of data source.
+
+#### Set the parallelism of operators to `-1`
+Adaptive Batch Scheduler will only decide parallelism for operators whose 
parallelism is not specified by users (parallelism is `-1`). So if you want the 
parallelism of operators to be decided automatically, you should configure as 
follows:
+- Set [`parallelism.default`]({{< ref "docs/deployment/config" 
>}}#parallelism-default) to `-1`
+- Set [`table.exec.resource.default-parallelism`]({{< ref 
"docs/deployment/config" >}}#table-exec-resource-default-parallelism) to `-1` 
in SQL jobs.
+- Don't call `setParallelism()` for operators in DataStream/DataSet jobs.
+
+### Performance tuning
+
+1. It's recommended to use [Sort 
Shuffle](https://flink.apache.org/2021/10/26/sort-shuffle-part1.html) and set 
[`taskmanager.network.memory.buffers-per-channel`]({{< ref 
"docs/deployment/config" >}}#taskmanager-network-memory-buffers-per-channel) to 
`0`. This can decouple the required network memory from parallelism, so that 
for large scale jobs, the "Insufficient number of network buffers" errors are 
less likely to happen.
+2. It's recommended to set 
[`jobmanager.adaptive-batch-scheduler.max-parallelism`]({{< ref 
"docs/deployment/config" 
>}}#jobmanager-adaptive-batch-scheduler-max-parallelism) to the parallelism you 
expect to need in the worst case. Values larger than this are not recommended, 
because excessive value may affect the performance. This option can affect the 
number of subpartitions produced by upstream tasks, large number of 
subpartitions may degrade the performance of hash shuffle and the performance 
of network transmission due to small packets.

Review comment:
       this -> that

##########
File path: docs/content.zh/docs/deployment/adaptive_batch_scheduler.md
##########
@@ -25,39 +25,43 @@ under the License.
 
 ## Adaptive Batch Scheduler
 
-Adaptive Batch Scheduler 
是一种可以自动推导每个节点并行度的调度器。如果节点未设置并行度，调度器将根据其消耗的数据量的大小来推导其并行度。这可以带来诸多好处：
+Adaptive Batch Scheduler 
是一种可以自动推导每个算子并行度的批作业处理调度器。如果算子未设置并行度，调度器将根据其消费的数据量的大小来推导其并行度。这可以带来诸多好处：
 - 批作业用户可以从并行度调优中解脱出来
 - 根据数据量自动推导并行度可以更好地适应每天变化的数据量
-- SQL作业中的节点也可以分配不同的并行性
+- SQL作业中的算子也可以分配不同的并行性
 
-### Usage
+### 用法
 
-使用 Adaptive Batch Scheduler 自动推导作业节点的并行度，需要：
+使用 Adaptive Batch Scheduler 自动推导算子的并行度，需要：
 - 启用 Adaptive Batch Scheduler
-- 配置节点的并行度为 `-1`
+- 配置算子的并行度为 `-1`
 
 #### 启用 Adaptive Batch Scheduler
-为了启用 Adaptive Batch Scheduler, 你需要将 [`jobmanager.scheduler`]({{< ref 
"docs/deployment/config" >}}#jobmanager-scheduler) 配置为 `AdpaptiveBatch`。除此之外，使用 
Adaptive Batch Scheduler 时，以下配置也可以选择性配置:
-- [`jobmanager.adaptive-batch-scheduler.min-parallelism`]({{< ref 
"docs/deployment/config" 
>}}#jobmanager-adaptive-batch-scheduler-min-parallelism): 允许设置的并行度最小值
-- [`jobmanager.adaptive-batch-scheduler.max-parallelism`]({{< ref 
"docs/deployment/config" 
>}}#jobmanager-adaptive-batch-scheduler-max-parallelism): 允许设置的并行度最大值
+为了启用 Adaptive Batch Scheduler, 你需要：
+- 将[`jobmanager.scheduler`]({{< ref "docs/deployment/config" 
>}}#jobmanager-scheduler) 配置为 `AdpaptiveBatch`。
+- 由于 ["ALL-EXCHANGES-BLOCKING jobs only"](#限制), 
需要将[`execution.batch-shuffle-mode`]({{< ref "docs/deployment/config" 
>}}#execution-batch-shuffle-mode) 配置为 `ALL-EXCHANGES-BLOCKING`(默认值) 。
+
+除此之外，使用 Adaptive Batch Scheduler 时，以下相关配置也可以调整:
+- [`jobmanager.adaptive-batch-scheduler.min-parallelism`]({{< ref 
"docs/deployment/config" 
>}}#jobmanager-adaptive-batch-scheduler-min-parallelism): 允许自动设置的并行度最小值
+- [`jobmanager.adaptive-batch-scheduler.max-parallelism`]({{< ref 
"docs/deployment/config" 
>}}#jobmanager-adaptive-batch-scheduler-max-parallelism): 允许自动设置的并行度最大值
 - [`jobmanager.adaptive-batch-scheduler.data-volume-per-task`]({{< ref 
"docs/deployment/config" 
>}}#jobmanager-adaptive-batch-scheduler-data-volume-per-task): 期望每个任务处理的数据量大小
-- [`jobmanager.adaptive-batch-scheduler.default-source-parallelism`]({{< ref 
"docs/deployment/config" 
>}}#jobmanager-adaptive-batch-scheduler-default-source-parallelism): source 
节点的默认并行度
+- [`jobmanager.adaptive-batch-scheduler.default-source-parallelism`]({{< ref 
"docs/deployment/config" 
>}}#jobmanager-adaptive-batch-scheduler-default-source-parallelism): source 
算子的默认并行度
 
-#### 配置节点的并行度为 `-1`
-Adaptive Batch Scheduler 只会为用户未指定并行度的作业节点（并行度为 `-1`）推导并行度。 
所以如果你想自动推导节点的并行度，需要进行以下配置：
+#### 配置算子的并行度为 `-1`
+Adaptive Batch Scheduler 只会为用户未指定并行度的算子（并行度为 `-1`）推导并行度。 
所以如果你想自动推导算子的并行度，需要进行以下配置：
 - 配置 `parallelism.default` 为 `-1`
 - 对于 SQL 作业，需要配置 `table.exec.resource.default-parallelism` 为 `-1`
-- 对于 DataStream 作业，不要在作业中通过算子的 `setParallelism()` 方法来指定并行度
+- 对于 DataStream/DataSet 作业，不要在作业中通过算子的 `setParallelism()` 方法来指定并行度
 
 ### 性能调优
 
-1. 建议使用 `Sort Shuffle` 并且设置 
[`taskmanager.network.memory.buffers-per-channel`]({{< ref 
"docs/deployment/config" >}}#taskmanager-network-memory-buffers-per-channel) 为 
`0`。 这会解耦并发与网络内存使用量，对于大规模作业，这降低了遇到 "Insufficient number of network buffers" 
错误的可能性。
-2. 不建议为 [`jobmanager.adaptive-batch-scheduler.max-parallelism`]({{< ref 
"docs/deployment/config" 
>}}#jobmanager-adaptive-batch-scheduler-max-parallelism) 
配置太大的值，否则会影响性能。因为这个选项会影响上游任务产出的 subpartition 的数量，过多的 subpartition 可能会影响 hash 
shuffle 的性能，或者由于小包影响网络传输的性能。
+1. 建议使用 [Sort 
Shuffle](https://flink.apache.org/2021/10/26/sort-shuffle-part1.html) 并且设置 
[`taskmanager.network.memory.buffers-per-channel`]({{< ref 
"docs/deployment/config" >}}#taskmanager-network-memory-buffers-per-channel) 为 
`0`。 这会解耦并发与网络内存使用量，对于大规模作业，这样可以降低遇到 "Insufficient number of network buffers" 
错误的可能性。
+2. 建议将 [`jobmanager.adaptive-batch-scheduler.max-parallelism`]({{< ref 
"docs/deployment/config" 
>}}#jobmanager-adaptive-batch-scheduler-max-parallelism) 
设置为最坏情况下预期需要的并行度。不建议配置太大的值，因为值过大可能会影响性能。这个选项会影响上游任务产出的 subpartition 的数量，过多的 
subpartition 可能会影响 hash shuffle 的性能，或者由于小包影响网络传输的性能。
 
 ### 限制
-
-- **ALL-EDGES-BLOCKING batch jobs only**: 目前 Adaptive Batch Scheduler 只支持 
ALL-EDGES-BLOCKING 的批作业。
-- **Inconsistent broadcast results metrics on WebUI**: 在使用 Adaptive Batch 
Scheduler 时，对于 broadcast 边，上游节点发送的数据量和下游节点接收的数据量可能会不相等，这在显示上会困扰用户。细节详见 
[FLIP-187](https://cwiki.apache.org/confluence/display/FLINK/FLIP-187%3A+Adaptive+Batch+Job+Scheduler)
+- **Batch jobs only**: Adaptive Batch Scheduler 只支持批作业.

Review comment:
       > **Batch jobs only**: Adaptive Batch Scheduler 只支持批作业.
   
   -> **只支持批作业**: Adaptive Batch Scheduler 只支持批作业。当提交的是一个流作业时，会抛出异常。
   

##########
File path: docs/content/docs/deployment/elastic_scaling.md
##########
@@ -151,5 +151,45 @@ The behavior of Adaptive Scheduler is configured by [all 
configuration options c
 - **Unused slots**: If the max parallelism for slot sharing groups is not 
equal, slots offered to Adaptive Scheduler might be unused.
 - Scaling events trigger job and task restarts, which will increase the number 
of Task attempts.
 
+## Adaptive Batch Scheduler
+
+The Adaptive Batch Scheduler can automatically decide parallelisms of 
operators for batch jobs. If an operator is not set with a parallelism, the 
scheduler will decide parallelism for it according to the size of its consumed 
datasets. This can bring many benefits:
+- Batch job users can be relieved from parallelism tuning
+- Automatically tuned parallelisms can better fit consumed datasets which have 
a varying volume size every day
+- Operators from SQL batch jobs can be assigned with different parallelisms 
which are automatically tuned
+
+### Usage
+
+To automatically decide parallelisms for operators with Adaptive Batch 
Scheduler, you need to:
+- Configure to use Adaptive Batch Scheduler.
+- Set the parallelism of operators to `-1`.
+  
+#### Configure to use Adaptive Batch Scheduler
+To use Adaptive Batch Scheduler, you need to:
+- Set the [`jobmanager.scheduler`]({{< ref "docs/deployment/config" 
>}}#jobmanager-scheduler) to `AdaptiveBatch`
+- Set the [`execution.batch-shuffle-mode`]({{< ref "docs/deployment/config" 
>}}#execution-batch-shuffle-mode) to `ALL-EXCHANGES-BLOCKING`(default value) 
due to ["ALL-EXCHANGES-BLOCKING jobs only"](#Limitations).

Review comment:
       Given that `ALL-EXCHANGES-BLOCKING` is the default value, maybe change 
the statement to be like "Leave `execution.batch-shuffle-mode` unset or 
explicitly set it to `ALL-EXCHANGES-BLOCKING`(default value)".

##########
File path: docs/content.zh/docs/deployment/adaptive_batch_scheduler.md
##########
@@ -25,39 +25,43 @@ under the License.
 
 ## Adaptive Batch Scheduler
 
-Adaptive Batch Scheduler 
是一种可以自动推导每个节点并行度的调度器。如果节点未设置并行度，调度器将根据其消耗的数据量的大小来推导其并行度。这可以带来诸多好处：
+Adaptive Batch Scheduler 
是一种可以自动推导每个算子并行度的批作业处理调度器。如果算子未设置并行度，调度器将根据其消费的数据量的大小来推导其并行度。这可以带来诸多好处：
 - 批作业用户可以从并行度调优中解脱出来
 - 根据数据量自动推导并行度可以更好地适应每天变化的数据量
-- SQL作业中的节点也可以分配不同的并行性
+- SQL作业中的算子也可以分配不同的并行性
 
-### Usage
+### 用法
 
-使用 Adaptive Batch Scheduler 自动推导作业节点的并行度，需要：
+使用 Adaptive Batch Scheduler 自动推导算子的并行度，需要：
 - 启用 Adaptive Batch Scheduler
-- 配置节点的并行度为 `-1`
+- 配置算子的并行度为 `-1`
 
 #### 启用 Adaptive Batch Scheduler
-为了启用 Adaptive Batch Scheduler, 你需要将 [`jobmanager.scheduler`]({{< ref 
"docs/deployment/config" >}}#jobmanager-scheduler) 配置为 `AdpaptiveBatch`。除此之外，使用 
Adaptive Batch Scheduler 时，以下配置也可以选择性配置:
-- [`jobmanager.adaptive-batch-scheduler.min-parallelism`]({{< ref 
"docs/deployment/config" 
>}}#jobmanager-adaptive-batch-scheduler-min-parallelism): 允许设置的并行度最小值
-- [`jobmanager.adaptive-batch-scheduler.max-parallelism`]({{< ref 
"docs/deployment/config" 
>}}#jobmanager-adaptive-batch-scheduler-max-parallelism): 允许设置的并行度最大值
+为了启用 Adaptive Batch Scheduler, 你需要：
+- 将[`jobmanager.scheduler`]({{< ref "docs/deployment/config" 
>}}#jobmanager-scheduler) 配置为 `AdpaptiveBatch`。
+- 由于 ["ALL-EXCHANGES-BLOCKING jobs only"](#限制), 
需要将[`execution.batch-shuffle-mode`]({{< ref "docs/deployment/config" 
>}}#execution-batch-shuffle-mode) 配置为 `ALL-EXCHANGES-BLOCKING`(默认值) 。
+
+除此之外，使用 Adaptive Batch Scheduler 时，以下相关配置也可以调整:
+- [`jobmanager.adaptive-batch-scheduler.min-parallelism`]({{< ref 
"docs/deployment/config" 
>}}#jobmanager-adaptive-batch-scheduler-min-parallelism): 允许自动设置的并行度最小值
+- [`jobmanager.adaptive-batch-scheduler.max-parallelism`]({{< ref 
"docs/deployment/config" 
>}}#jobmanager-adaptive-batch-scheduler-max-parallelism): 允许自动设置的并行度最大值
 - [`jobmanager.adaptive-batch-scheduler.data-volume-per-task`]({{< ref 
"docs/deployment/config" 
>}}#jobmanager-adaptive-batch-scheduler-data-volume-per-task): 期望每个任务处理的数据量大小
-- [`jobmanager.adaptive-batch-scheduler.default-source-parallelism`]({{< ref 
"docs/deployment/config" 
>}}#jobmanager-adaptive-batch-scheduler-default-source-parallelism): source 
节点的默认并行度
+- [`jobmanager.adaptive-batch-scheduler.default-source-parallelism`]({{< ref 
"docs/deployment/config" 
>}}#jobmanager-adaptive-batch-scheduler-default-source-parallelism): source 
算子的默认并行度
 
-#### 配置节点的并行度为 `-1`
-Adaptive Batch Scheduler 只会为用户未指定并行度的作业节点（并行度为 `-1`）推导并行度。 
所以如果你想自动推导节点的并行度，需要进行以下配置：
+#### 配置算子的并行度为 `-1`
+Adaptive Batch Scheduler 只会为用户未指定并行度的算子（并行度为 `-1`）推导并行度。 
所以如果你想自动推导算子的并行度，需要进行以下配置：
 - 配置 `parallelism.default` 为 `-1`
 - 对于 SQL 作业，需要配置 `table.exec.resource.default-parallelism` 为 `-1`
-- 对于 DataStream 作业，不要在作业中通过算子的 `setParallelism()` 方法来指定并行度
+- 对于 DataStream/DataSet 作业，不要在作业中通过算子的 `setParallelism()` 方法来指定并行度
 
 ### 性能调优
 
-1. 建议使用 `Sort Shuffle` 并且设置 
[`taskmanager.network.memory.buffers-per-channel`]({{< ref 
"docs/deployment/config" >}}#taskmanager-network-memory-buffers-per-channel) 为 
`0`。 这会解耦并发与网络内存使用量，对于大规模作业，这降低了遇到 "Insufficient number of network buffers" 
错误的可能性。
-2. 不建议为 [`jobmanager.adaptive-batch-scheduler.max-parallelism`]({{< ref 
"docs/deployment/config" 
>}}#jobmanager-adaptive-batch-scheduler-max-parallelism) 
配置太大的值，否则会影响性能。因为这个选项会影响上游任务产出的 subpartition 的数量，过多的 subpartition 可能会影响 hash 
shuffle 的性能，或者由于小包影响网络传输的性能。
+1. 建议使用 [Sort 
Shuffle](https://flink.apache.org/2021/10/26/sort-shuffle-part1.html) 并且设置 
[`taskmanager.network.memory.buffers-per-channel`]({{< ref 
"docs/deployment/config" >}}#taskmanager-network-memory-buffers-per-channel) 为 
`0`。 这会解耦并发与网络内存使用量，对于大规模作业，这样可以降低遇到 "Insufficient number of network buffers" 
错误的可能性。

Review comment:
       并发与网络内存使用量 -> 并行度与需要的网络内存

##########
File path: docs/content.zh/docs/deployment/adaptive_batch_scheduler.md
##########
@@ -25,39 +25,43 @@ under the License.
 
 ## Adaptive Batch Scheduler
 
-Adaptive Batch Scheduler 
是一种可以自动推导每个节点并行度的调度器。如果节点未设置并行度，调度器将根据其消耗的数据量的大小来推导其并行度。这可以带来诸多好处：
+Adaptive Batch Scheduler 
是一种可以自动推导每个算子并行度的批作业处理调度器。如果算子未设置并行度，调度器将根据其消费的数据量的大小来推导其并行度。这可以带来诸多好处：
 - 批作业用户可以从并行度调优中解脱出来
 - 根据数据量自动推导并行度可以更好地适应每天变化的数据量
-- SQL作业中的节点也可以分配不同的并行性
+- SQL作业中的算子也可以分配不同的并行性
 
-### Usage
+### 用法
 
-使用 Adaptive Batch Scheduler 自动推导作业节点的并行度，需要：
+使用 Adaptive Batch Scheduler 自动推导算子的并行度，需要：
 - 启用 Adaptive Batch Scheduler
-- 配置节点的并行度为 `-1`
+- 配置算子的并行度为 `-1`
 
 #### 启用 Adaptive Batch Scheduler
-为了启用 Adaptive Batch Scheduler, 你需要将 [`jobmanager.scheduler`]({{< ref 
"docs/deployment/config" >}}#jobmanager-scheduler) 配置为 `AdpaptiveBatch`。除此之外，使用 
Adaptive Batch Scheduler 时，以下配置也可以选择性配置:
-- [`jobmanager.adaptive-batch-scheduler.min-parallelism`]({{< ref 
"docs/deployment/config" 
>}}#jobmanager-adaptive-batch-scheduler-min-parallelism): 允许设置的并行度最小值
-- [`jobmanager.adaptive-batch-scheduler.max-parallelism`]({{< ref 
"docs/deployment/config" 
>}}#jobmanager-adaptive-batch-scheduler-max-parallelism): 允许设置的并行度最大值
+为了启用 Adaptive Batch Scheduler, 你需要：
+- 将[`jobmanager.scheduler`]({{< ref "docs/deployment/config" 
>}}#jobmanager-scheduler) 配置为 `AdpaptiveBatch`。
+- 由于 ["ALL-EXCHANGES-BLOCKING jobs only"](#限制), 
需要将[`execution.batch-shuffle-mode`]({{< ref "docs/deployment/config" 
>}}#execution-batch-shuffle-mode) 配置为 `ALL-EXCHANGES-BLOCKING`(默认值) 。
+
+除此之外，使用 Adaptive Batch Scheduler 时，以下相关配置也可以调整:
+- [`jobmanager.adaptive-batch-scheduler.min-parallelism`]({{< ref 
"docs/deployment/config" 
>}}#jobmanager-adaptive-batch-scheduler-min-parallelism): 允许自动设置的并行度最小值
+- [`jobmanager.adaptive-batch-scheduler.max-parallelism`]({{< ref 
"docs/deployment/config" 
>}}#jobmanager-adaptive-batch-scheduler-max-parallelism): 允许自动设置的并行度最大值
 - [`jobmanager.adaptive-batch-scheduler.data-volume-per-task`]({{< ref 
"docs/deployment/config" 
>}}#jobmanager-adaptive-batch-scheduler-data-volume-per-task): 期望每个任务处理的数据量大小
-- [`jobmanager.adaptive-batch-scheduler.default-source-parallelism`]({{< ref 
"docs/deployment/config" 
>}}#jobmanager-adaptive-batch-scheduler-default-source-parallelism): source 
节点的默认并行度
+- [`jobmanager.adaptive-batch-scheduler.default-source-parallelism`]({{< ref 
"docs/deployment/config" 
>}}#jobmanager-adaptive-batch-scheduler-default-source-parallelism): source 
算子的默认并行度
 
-#### 配置节点的并行度为 `-1`
-Adaptive Batch Scheduler 只会为用户未指定并行度的作业节点（并行度为 `-1`）推导并行度。 
所以如果你想自动推导节点的并行度，需要进行以下配置：
+#### 配置算子的并行度为 `-1`
+Adaptive Batch Scheduler 只会为用户未指定并行度的算子（并行度为 `-1`）推导并行度。 
所以如果你想自动推导算子的并行度，需要进行以下配置：
 - 配置 `parallelism.default` 为 `-1`
 - 对于 SQL 作业，需要配置 `table.exec.resource.default-parallelism` 为 `-1`
-- 对于 DataStream 作业，不要在作业中通过算子的 `setParallelism()` 方法来指定并行度
+- 对于 DataStream/DataSet 作业，不要在作业中通过算子的 `setParallelism()` 方法来指定并行度
 
 ### 性能调优
 
-1. 建议使用 `Sort Shuffle` 并且设置 
[`taskmanager.network.memory.buffers-per-channel`]({{< ref 
"docs/deployment/config" >}}#taskmanager-network-memory-buffers-per-channel) 为 
`0`。 这会解耦并发与网络内存使用量，对于大规模作业，这降低了遇到 "Insufficient number of network buffers" 
错误的可能性。
-2. 不建议为 [`jobmanager.adaptive-batch-scheduler.max-parallelism`]({{< ref 
"docs/deployment/config" 
>}}#jobmanager-adaptive-batch-scheduler-max-parallelism) 
配置太大的值，否则会影响性能。因为这个选项会影响上游任务产出的 subpartition 的数量，过多的 subpartition 可能会影响 hash 
shuffle 的性能，或者由于小包影响网络传输的性能。
+1. 建议使用 [Sort 
Shuffle](https://flink.apache.org/2021/10/26/sort-shuffle-part1.html) 并且设置 
[`taskmanager.network.memory.buffers-per-channel`]({{< ref 
"docs/deployment/config" >}}#taskmanager-network-memory-buffers-per-channel) 为 
`0`。 这会解耦并发与网络内存使用量，对于大规模作业，这样可以降低遇到 "Insufficient number of network buffers" 
错误的可能性。
+2. 建议将 [`jobmanager.adaptive-batch-scheduler.max-parallelism`]({{< ref 
"docs/deployment/config" 
>}}#jobmanager-adaptive-batch-scheduler-max-parallelism) 
设置为最坏情况下预期需要的并行度。不建议配置太大的值，因为值过大可能会影响性能。这个选项会影响上游任务产出的 subpartition 的数量，过多的 
subpartition 可能会影响 hash shuffle 的性能，或者由于小包影响网络传输的性能。

Review comment:
       因为值过大 -> 否则
   
   选项 -> 配置项

##########
File path: docs/content.zh/docs/deployment/adaptive_batch_scheduler.md
##########
@@ -25,39 +25,43 @@ under the License.
 
 ## Adaptive Batch Scheduler
 
-Adaptive Batch Scheduler 
是一种可以自动推导每个节点并行度的调度器。如果节点未设置并行度，调度器将根据其消耗的数据量的大小来推导其并行度。这可以带来诸多好处：
+Adaptive Batch Scheduler 
是一种可以自动推导每个算子并行度的批作业处理调度器。如果算子未设置并行度，调度器将根据其消费的数据量的大小来推导其并行度。这可以带来诸多好处：
 - 批作业用户可以从并行度调优中解脱出来
 - 根据数据量自动推导并行度可以更好地适应每天变化的数据量
-- SQL作业中的节点也可以分配不同的并行性
+- SQL作业中的算子也可以分配不同的并行性
 
-### Usage
+### 用法
 
-使用 Adaptive Batch Scheduler 自动推导作业节点的并行度，需要：
+使用 Adaptive Batch Scheduler 自动推导算子的并行度，需要：
 - 启用 Adaptive Batch Scheduler
-- 配置节点的并行度为 `-1`
+- 配置算子的并行度为 `-1`
 
 #### 启用 Adaptive Batch Scheduler
-为了启用 Adaptive Batch Scheduler, 你需要将 [`jobmanager.scheduler`]({{< ref 
"docs/deployment/config" >}}#jobmanager-scheduler) 配置为 `AdpaptiveBatch`。除此之外，使用 
Adaptive Batch Scheduler 时，以下配置也可以选择性配置:
-- [`jobmanager.adaptive-batch-scheduler.min-parallelism`]({{< ref 
"docs/deployment/config" 
>}}#jobmanager-adaptive-batch-scheduler-min-parallelism): 允许设置的并行度最小值
-- [`jobmanager.adaptive-batch-scheduler.max-parallelism`]({{< ref 
"docs/deployment/config" 
>}}#jobmanager-adaptive-batch-scheduler-max-parallelism): 允许设置的并行度最大值
+为了启用 Adaptive Batch Scheduler, 你需要：
+- 将[`jobmanager.scheduler`]({{< ref "docs/deployment/config" 
>}}#jobmanager-scheduler) 配置为 `AdpaptiveBatch`。
+- 由于 ["ALL-EXCHANGES-BLOCKING jobs only"](#限制), 
需要将[`execution.batch-shuffle-mode`]({{< ref "docs/deployment/config" 
>}}#execution-batch-shuffle-mode) 配置为 `ALL-EXCHANGES-BLOCKING`(默认值) 。
+
+除此之外，使用 Adaptive Batch Scheduler 时，以下相关配置也可以调整:
+- [`jobmanager.adaptive-batch-scheduler.min-parallelism`]({{< ref 
"docs/deployment/config" 
>}}#jobmanager-adaptive-batch-scheduler-min-parallelism): 允许自动设置的并行度最小值
+- [`jobmanager.adaptive-batch-scheduler.max-parallelism`]({{< ref 
"docs/deployment/config" 
>}}#jobmanager-adaptive-batch-scheduler-max-parallelism): 允许自动设置的并行度最大值
 - [`jobmanager.adaptive-batch-scheduler.data-volume-per-task`]({{< ref 
"docs/deployment/config" 
>}}#jobmanager-adaptive-batch-scheduler-data-volume-per-task): 期望每个任务处理的数据量大小
-- [`jobmanager.adaptive-batch-scheduler.default-source-parallelism`]({{< ref 
"docs/deployment/config" 
>}}#jobmanager-adaptive-batch-scheduler-default-source-parallelism): source 
节点的默认并行度
+- [`jobmanager.adaptive-batch-scheduler.default-source-parallelism`]({{< ref 
"docs/deployment/config" 
>}}#jobmanager-adaptive-batch-scheduler-default-source-parallelism): source 
算子的默认并行度
 
-#### 配置节点的并行度为 `-1`
-Adaptive Batch Scheduler 只会为用户未指定并行度的作业节点（并行度为 `-1`）推导并行度。 
所以如果你想自动推导节点的并行度，需要进行以下配置：
+#### 配置算子的并行度为 `-1`
+Adaptive Batch Scheduler 只会为用户未指定并行度的算子（并行度为 `-1`）推导并行度。 
所以如果你想自动推导算子的并行度，需要进行以下配置：
 - 配置 `parallelism.default` 为 `-1`
 - 对于 SQL 作业，需要配置 `table.exec.resource.default-parallelism` 为 `-1`
-- 对于 DataStream 作业，不要在作业中通过算子的 `setParallelism()` 方法来指定并行度
+- 对于 DataStream/DataSet 作业，不要在作业中通过算子的 `setParallelism()` 方法来指定并行度
 
 ### 性能调优
 
-1. 建议使用 `Sort Shuffle` 并且设置 
[`taskmanager.network.memory.buffers-per-channel`]({{< ref 
"docs/deployment/config" >}}#taskmanager-network-memory-buffers-per-channel) 为 
`0`。 这会解耦并发与网络内存使用量，对于大规模作业，这降低了遇到 "Insufficient number of network buffers" 
错误的可能性。
-2. 不建议为 [`jobmanager.adaptive-batch-scheduler.max-parallelism`]({{< ref 
"docs/deployment/config" 
>}}#jobmanager-adaptive-batch-scheduler-max-parallelism) 
配置太大的值，否则会影响性能。因为这个选项会影响上游任务产出的 subpartition 的数量，过多的 subpartition 可能会影响 hash 
shuffle 的性能，或者由于小包影响网络传输的性能。
+1. 建议使用 [Sort 
Shuffle](https://flink.apache.org/2021/10/26/sort-shuffle-part1.html) 并且设置 
[`taskmanager.network.memory.buffers-per-channel`]({{< ref 
"docs/deployment/config" >}}#taskmanager-network-memory-buffers-per-channel) 为 
`0`。 这会解耦并发与网络内存使用量，对于大规模作业，这样可以降低遇到 "Insufficient number of network buffers" 
错误的可能性。
+2. 建议将 [`jobmanager.adaptive-batch-scheduler.max-parallelism`]({{< ref 
"docs/deployment/config" 
>}}#jobmanager-adaptive-batch-scheduler-max-parallelism) 
设置为最坏情况下预期需要的并行度。不建议配置太大的值，因为值过大可能会影响性能。这个选项会影响上游任务产出的 subpartition 的数量，过多的 
subpartition 可能会影响 hash shuffle 的性能，或者由于小包影响网络传输的性能。
 
 ### 限制
-
-- **ALL-EDGES-BLOCKING batch jobs only**: 目前 Adaptive Batch Scheduler 只支持 
ALL-EDGES-BLOCKING 的批作业。
-- **Inconsistent broadcast results metrics on WebUI**: 在使用 Adaptive Batch 
Scheduler 时，对于 broadcast 边，上游节点发送的数据量和下游节点接收的数据量可能会不相等，这在显示上会困扰用户。细节详见 
[FLIP-187](https://cwiki.apache.org/confluence/display/FLINK/FLIP-187%3A+Adaptive+Batch+Job+Scheduler)
+- **Batch jobs only**: Adaptive Batch Scheduler 只支持批作业.
+- **ALL-EXCHANGES-BLOCKING jobs only**: 目前 Adaptive Batch Scheduler 只支持 
[shuffle mode]({{< ref "docs/deployment/config" 
>}}#execution-batch-shuffle-mode) 为 ALL-EXCHANGES-BLOCKING 的作业。
+- **Inconsistent broadcast results metrics on WebUI**: 在使用 Adaptive Batch 
Scheduler 时，对于 broadcast 边，上游算子发送的数据量和下游算子接收的数据量可能会不相等，这在 Web UI 
的显示上可能会困扰用户。细节详见 
[FLIP-187](https://cwiki.apache.org/confluence/display/FLINK/FLIP-187%3A+Adaptive+Batch+Job+Scheduler)

Review comment:
       **Inconsistent broadcast results metrics on WebUI** -> **Web UI 
上展示的上游输出的数据量和下游收到的数据量可能不一致**




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [flink] zhuzhurk commented on a change in pull request #18757: [FLINK-25226][doc] Add documentation about the AdaptiveBatchScheduler

Reply via email to