[GitHub] [flink] zhuzhurk commented on a diff in pull request #21801: [FLINK-30838][doc] Update documentation about the AdaptiveBatchScheduler

via GitHub Wed, 08 Feb 2023 02:56:37 -0800


zhuzhurk commented on code in PR #21801:
URL: https://github.com/apache/flink/pull/21801#discussion_r1099966226



##########
docs/content/docs/deployment/elastic_scaling.md:
##########
@@ -152,45 +152,46 @@ The behavior of Adaptive Scheduler is configured by [all 
configuration options c
 
 ## Adaptive Batch Scheduler
 
-The Adaptive Batch Scheduler can automatically decide parallelisms of 
operators for batch jobs. If an operator is not set with a parallelism, the 
scheduler will decide parallelism for it according to the size of its consumed 
datasets. This can bring many benefits:
+The Adaptive Batch Scheduler is a batch job scheduler that can automatically 
adjust the execution plan. It currently supports automatically deciding 
parallelisms of operators for batch jobs. If an operator is not set with a 
parallelism, the scheduler will decide parallelism for it according to the size 
of its consumed datasets. This can bring many benefits:
 - Batch job users can be relieved from parallelism tuning
 - Automatically tuned parallelisms can better fit consumed datasets which have 
a varying volume size every day
 - Operators from SQL batch jobs can be assigned with different parallelisms 
which are automatically tuned
 
-### Usage
+At present, the Adaptive Batch Scheduler is the default scheduler for Flink 
batch jobs. No additional configuration is required unless other schedulers are 
explicitly configured, e.g. `jobmanager.scheduler: default`. Note that you need 
to
+leave the [`execution.batch-shuffle-mode`]({{< ref "docs/deployment/config" 
>}}#execution-batch-shuffle-mode) unset or explicitly set it to 
`ALL-EXCHANGES-BLOCKING` (default value) due to ["ALL-EXCHANGES-BLOCKING jobs 
only"](#limitations-2).
+
+### Automatically decide parallelisms for operators
+
+#### Usage
 
 To automatically decide parallelisms for operators with Adaptive Batch 
Scheduler, you need to:
-- Configure to use Adaptive Batch Scheduler.
-- Set the parallelism of operators to `-1`.
+- Configure to automatically decide parallelisms for operators: 

Review Comment:
   ```suggestion
   - Toggle the feature on:
   ```



##########
docs/content/docs/deployment/elastic_scaling.md:
##########
@@ -152,45 +152,46 @@ The behavior of Adaptive Scheduler is configured by [all 
configuration options c
 
 ## Adaptive Batch Scheduler
 
-The Adaptive Batch Scheduler can automatically decide parallelisms of 
operators for batch jobs. If an operator is not set with a parallelism, the 
scheduler will decide parallelism for it according to the size of its consumed 
datasets. This can bring many benefits:
+The Adaptive Batch Scheduler is a batch job scheduler that can automatically 
adjust the execution plan. It currently supports automatically deciding 
parallelisms of operators for batch jobs. If an operator is not set with a 
parallelism, the scheduler will decide parallelism for it according to the size 
of its consumed datasets. This can bring many benefits:
 - Batch job users can be relieved from parallelism tuning
 - Automatically tuned parallelisms can better fit consumed datasets which have 
a varying volume size every day
 - Operators from SQL batch jobs can be assigned with different parallelisms 
which are automatically tuned
 
-### Usage
+At present, the Adaptive Batch Scheduler is the default scheduler for Flink 
batch jobs. No additional configuration is required unless other schedulers are 
explicitly configured, e.g. `jobmanager.scheduler: default`. Note that you need 
to
+leave the [`execution.batch-shuffle-mode`]({{< ref "docs/deployment/config" 
>}}#execution-batch-shuffle-mode) unset or explicitly set it to 
`ALL-EXCHANGES-BLOCKING` (default value) due to ["ALL-EXCHANGES-BLOCKING jobs 
only"](#limitations-2).
+
+### Automatically decide parallelisms for operators
+
+#### Usage
 
 To automatically decide parallelisms for operators with Adaptive Batch 
Scheduler, you need to:
-- Configure to use Adaptive Batch Scheduler.
-- Set the parallelism of operators to `-1`.
+- Configure to automatically decide parallelisms for operators: 
   
-#### Configure to use Adaptive Batch Scheduler
-To use Adaptive Batch Scheduler, you need to:
-- Set `jobmanager.scheduler: AdaptiveBatch`.
-- Leave the [`execution.batch-shuffle-mode`]({{< ref "docs/deployment/config" 
>}}#execution-batch-shuffle-mode) unset or explicitly set it to 
`ALL-EXCHANGES-BLOCKING` (default value) due to ["ALL-EXCHANGES-BLOCKING jobs 
only"](#limitations-2).
-
-In addition, there are several related configuration options that may need 
adjustment when using Adaptive Batch Scheduler:
-- [`jobmanager.adaptive-batch-scheduler.min-parallelism`]({{< ref 
"docs/deployment/config" 
>}}#jobmanager-adaptive-batch-scheduler-min-parallelism): The lower bound of 
allowed parallelism to set adaptively.
-- [`jobmanager.adaptive-batch-scheduler.max-parallelism`]({{< ref 
"docs/deployment/config" 
>}}#jobmanager-adaptive-batch-scheduler-max-parallelism): The upper bound of 
allowed parallelism to set adaptively.
-- [`jobmanager.adaptive-batch-scheduler.avg-data-volume-per-task`]({{< ref 
"docs/deployment/config" 
>}}#jobmanager-adaptive-batch-scheduler-avg-data-volume-per-task): The average 
size of data volume to expect each task instance to process. Note that when 
data skew occurs, or the decided parallelism reaches the max parallelism (due 
to too much data), the data actually processed by some tasks may far exceed 
this value.
-- [`jobmanager.adaptive-batch-scheduler.default-source-parallelism`]({{< ref 
"docs/deployment/config" 
>}}#jobmanager-adaptive-batch-scheduler-default-source-parallelism): The 
default parallelism of data source.
-
-#### Set the parallelism of operators to `-1`
-Adaptive Batch Scheduler will only decide parallelism for operators whose 
parallelism is not specified by users (parallelism is `-1`). So if you want the 
parallelism of operators to be decided automatically, you should configure as 
follows:
-- Set `parallelism.default: -1`
-- Set `table.exec.resource.default-parallelism: -1` in SQL jobs.
-- Don't call `setParallelism()` for operators in DataStream/DataSet jobs.
-- Don't call `setParallelism()` on 
`StreamExecutionEnvironment/ExecutionEnvironment` in DataStream/DataSet jobs.
-
-### Performance tuning
+    Adaptive Batch Scheduler enables automatic parallelism derivation by 
default, you can configure 
[`execution.batch.adaptive.auto-parallelism.enabled`]({{< ref 
"docs/deployment/config" >}}#execution-batch-adaptive-auto-parallelism-enabled) 
to toggle this feature. 

Review Comment:
   , you -> . You



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [flink] zhuzhurk commented on a diff in pull request #21801: [FLINK-30838][doc] Update documentation about the AdaptiveBatchScheduler

Reply via email to