[GitHub] [flink] zhuzhurk commented on a diff in pull request #21801: [FLINK-30838][doc] Update documentation about the AdaptiveBatchScheduler

via GitHub Wed, 08 Feb 2023 01:08:16 -0800


zhuzhurk commented on code in PR #21801:
URL: https://github.com/apache/flink/pull/21801#discussion_r1099832977



##########
docs/content.zh/docs/deployment/elastic_scaling.md:
##########
@@ -150,7 +150,7 @@ Adaptive 调度器可以通过[所有在名字包含 `adaptive-scheduler` 的配
 
 ## Adaptive Batch Scheduler
 
-Adaptive Batch Scheduler 
是一种可以自动推导每个算子并行度的批作业调度器。如果算子未设置并行度，调度器将根据其消费的数据量的大小来推导其并行度。这可以带来诸多好处：
+Adaptive Batch Scheduler 
是一种可以自动调整执行计划的批作业调度器，目前支持自动推导每个算子并行度，如果算子未设置并行度，调度器将根据其消费的数据量的大小来推导其并行度。这可以带来诸多好处：

Review Comment:
   批作业调度器，目前支持自动推导每个算子并行度 -> 批作业调度器。它目前支持自动推导算子并行度



##########
docs/content.zh/docs/deployment/elastic_scaling.md:
##########
@@ -161,14 +161,15 @@ Adaptive Batch Scheduler 是一种可以自动推导每个算子并行度的批
 - 启用 Adaptive Batch Scheduler
 - 不要指定算子的并行度
 
-#### 启用 Adaptive Batch Scheduler
+#### 启用 Adaptive Batch Scheduler 自动推导并行度
 当前 Adaptive Batch Scheduler 是 Flink 默认的批作业调度器，无需额外配置。除非用户显式的配置了使用其他调度器，例如 
`jobmanager.scheduler: default`。需要注意的是，由于 ["只支持所有数据交换都为 BLOCKING 
模式的作业"](#局限性-2), 需要将 [`execution.batch-shuffle-mode`]({{< ref 
"docs/deployment/config" >}}#execution-batch-shuffle-mode) 配置为 
`ALL-EXCHANGES-BLOCKING`(默认值) 。

Review Comment:
   It's better to add a section "启用 Adaptive Batch Scheduler" in above and move 
this paragraph there.



##########
docs/content.zh/docs/deployment/elastic_scaling.md:
##########
@@ -161,14 +161,15 @@ Adaptive Batch Scheduler 是一种可以自动推导每个算子并行度的批
 - 启用 Adaptive Batch Scheduler
 - 不要指定算子的并行度
 
-#### 启用 Adaptive Batch Scheduler
+#### 启用 Adaptive Batch Scheduler 自动推导并行度

Review Comment:
   启用 Adaptive Batch Scheduler 自动推导并行度 -> 启用自动并行度推导



##########
docs/content.zh/docs/deployment/elastic_scaling.md:
##########
@@ -161,14 +161,15 @@ Adaptive Batch Scheduler 是一种可以自动推导每个算子并行度的批
 - 启用 Adaptive Batch Scheduler
 - 不要指定算子的并行度
 
-#### 启用 Adaptive Batch Scheduler
+#### 启用 Adaptive Batch Scheduler 自动推导并行度
 当前 Adaptive Batch Scheduler 是 Flink 默认的批作业调度器，无需额外配置。除非用户显式的配置了使用其他调度器，例如 
`jobmanager.scheduler: default`。需要注意的是，由于 ["只支持所有数据交换都为 BLOCKING 
模式的作业"](#局限性-2), 需要将 [`execution.batch-shuffle-mode`]({{< ref 
"docs/deployment/config" >}}#execution-batch-shuffle-mode) 配置为 
`ALL-EXCHANGES-BLOCKING`(默认值) 。
 
-除此之外，使用 Adaptive Batch Scheduler 时，以下相关配置也可以调整:
+除此之外，使用 Adaptive Batch Scheduler 自动推导并行度时，以下相关配置也可以调整:
+- [`execution.batch.adaptive.auto-parallelism.enabled`]({{< ref 
"docs/deployment/config" 
>}}#execution-batch-adaptive-auto-parallelism-enabled): 是否开启并行度推导，默认开启。

Review Comment:
   I would refrase it as "Adaptive Batch Scheduler 默认启用了自动并行度推导，你可以通过 xxx 
配置来开关此功能。除此之外，你也可以根据作业的情况调整以下配置:"



##########
docs/content/docs/deployment/elastic_scaling.md:
##########
@@ -190,6 +191,6 @@ In addition, the following configurations are required for 
DataSet jobs:
 - **Batch jobs only**: Adaptive Batch Scheduler only supports batch jobs. 
Exception will be thrown if a streaming job is submitted.
 - **ALL-EXCHANGES-BLOCKING jobs only**: At the moment, Adaptive Batch 
Scheduler only supports jobs whose [shuffle mode]({{< ref 
"docs/deployment/config" >}}#execution-batch-shuffle-mode) is 
`ALL-EXCHANGES-BLOCKING`.
 - **FileInputFormat sources are not supported**: FileInputFormat sources are 
not supported, including `StreamExecutionEnvironment#readFile(...)` 
`StreamExecutionEnvironment#readTextFile(...)` and 
`StreamExecutionEnvironment#createInput(FileInputFormat, ...)`. Users should 
use the new sources([FileSystem DataStream Connector]({{< ref 
"docs/connectors/datastream/filesystem.md" >}}) or [FileSystem SQL 
Connector]({{< ref "docs/connectors/table/filesystem.md" >}})) to read files 
when using the Adaptive Batch Scheduler.
-- **Inconsistent broadcast results metrics on WebUI**: In Adaptive Batch 
Scheduler, for broadcast results, the number of bytes/records sent by the 
upstream task counted by metric is not equal to the number of bytes/records 
received by the downstream task, which may confuse users when displayed on the 
Web UI. See 
[FLIP-187](https://cwiki.apache.org/confluence/display/FLINK/FLIP-187%3A+Adaptive+Batch+Job+Scheduler)
 for details.
+- **Inconsistent broadcast results metrics on WebUI**: When use Adaptive Batch 
Scheduler automatically decide parallelisms for operators, for broadcast 
results, the number of bytes/records sent by the upstream task counted by 
metric is not equal to the number of bytes/records received by the downstream 
task, which may confuse users when displayed on the Web UI. See 
[FLIP-187](https://cwiki.apache.org/confluence/display/FLINK/FLIP-187%3A+Adaptive+Batch+Job+Scheduler)
 for details.

Review Comment:
   automatically -> to automatically



##########
docs/content/docs/deployment/elastic_scaling.md:
##########
@@ -152,7 +152,7 @@ The behavior of Adaptive Scheduler is configured by [all 
configuration options c
 
 ## Adaptive Batch Scheduler
 
-The Adaptive Batch Scheduler can automatically decide parallelisms of 
operators for batch jobs. If an operator is not set with a parallelism, the 
scheduler will decide parallelism for it according to the size of its consumed 
datasets. This can bring many benefits:
+The Adaptive Batch Scheduler is a batch job scheduler that can automatically 
adjust the execution plan and currently support automatically decide 
parallelisms of operators for batch jobs. If an operator is not set with a 
parallelism, the scheduler will decide parallelism for it according to the size 
of its consumed datasets. This can bring many benefits:

Review Comment:
   plan and currently -> plan. It currently
   support -> supports
   decide -> deciding



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [flink] zhuzhurk commented on a diff in pull request #21801: [FLINK-30838][doc] Update documentation about the AdaptiveBatchScheduler

Reply via email to