zhuzhurk commented on a change in pull request #18757:
URL: https://github.com/apache/flink/pull/18757#discussion_r811538415
##########
File path: docs/content/docs/deployment/elastic_scaling.md
##########
@@ -151,5 +151,45 @@ The behavior of Adaptive Scheduler is configured by [all
configuration options c
- **Unused slots**: If the max parallelism for slot sharing groups is not
equal, slots offered to Adaptive Scheduler might be unused.
- Scaling events trigger job and task restarts, which will increase the number
of Task attempts.
+## Adaptive Batch Scheduler
+
+The Adaptive Batch Scheduler can automatically decide parallelisms of
operators for batch jobs. If an operator is not set with a parallelism, the
scheduler will decide parallelism for it according to the size of its consumed
datasets. This can bring many benefits:
+- Batch job users can be relieved from parallelism tuning
+- Automatically tuned parallelisms can better fit consumed datasets which have
a varying volume size every day
+- Operators from SQL batch jobs can be assigned with different parallelisms
which are automatically tuned
+
+### Usage
+
+To automatically decide parallelisms for operators with Adaptive Batch
Scheduler, you need to:
+- Configure to use Adaptive Batch Scheduler.
+- Set the parallelism of operators to `-1`.
+
+#### Configure to use Adaptive Batch Scheduler
+To use Adaptive Batch Scheduler, you need to:
+- Set `jobmanager.scheduler: AdaptiveBatch`.
+- Leave the [`execution.batch-shuffle-mode`]({{< ref "docs/deployment/config"
>}}#execution-batch-shuffle-mode) unset or explicitly set it to
`ALL-EXCHANGES-BLOCKING` (default value) due to ["ALL-EXCHANGES-BLOCKING jobs
only"](#limitations-2).
+
+In addition, there are several related configuration options that may need
adjustment when using Adaptive Batch Scheduler:
+- [`jobmanager.adaptive-batch-scheduler.min-parallelism`]({{< ref
"docs/deployment/config"
>}}#jobmanager-adaptive-batch-scheduler-min-parallelism): The lower bound of
allowed parallelism to set adaptively
+- [`jobmanager.adaptive-batch-scheduler.max-parallelism`]({{< ref
"docs/deployment/config"
>}}#jobmanager-adaptive-batch-scheduler-max-parallelism): The upper bound of
allowed parallelism to set adaptively
+- [`jobmanager.adaptive-batch-scheduler.data-volume-per-task`]({{< ref
"docs/deployment/config"
>}}#jobmanager-adaptive-batch-scheduler-data-volume-per-task): The size of data
volume to expect each task instance to process
+- [`jobmanager.adaptive-batch-scheduler.default-source-parallelism`]({{< ref
"docs/deployment/config"
>}}#jobmanager-adaptive-batch-scheduler-default-source-parallelism): The
default parallelism of data source.
+
+#### Set the parallelism of operators to `-1`
+Adaptive Batch Scheduler will only decide parallelism for operators whose
parallelism is not specified by users (parallelism is `-1`). So if you want the
parallelism of operators to be decided automatically, you should configure as
follows:
+- Set `parallelism.default: -1`
+- Set `table.exec.resource.default-parallelism: -1` in SQL jobs.
+- Don't call `setParallelism()` for operators in DataStream/DataSet jobs.
Review comment:
maybe add one line to ask users to not invoke `setParallelism(...)` on
`ExecutionEnvironment` nor `StreamingExecutionEnvironment`?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]