zhuzhurk commented on a change in pull request #18757:
URL: https://github.com/apache/flink/pull/18757#discussion_r829715509
##########
File path: docs/content/docs/deployment/elastic_scaling.md
##########
@@ -170,9 +170,9 @@ To use Adaptive Batch Scheduler, you need to:
- Leave the [`execution.batch-shuffle-mode`]({{< ref "docs/deployment/config"
>}}#execution-batch-shuffle-mode) unset or explicitly set it to
`ALL-EXCHANGES-BLOCKING` (default value) due to ["ALL-EXCHANGES-BLOCKING jobs
only"](#limitations-2).
In addition, there are several related configuration options that may need
adjustment when using Adaptive Batch Scheduler:
-- [`jobmanager.adaptive-batch-scheduler.min-parallelism`]({{< ref
"docs/deployment/config"
>}}#jobmanager-adaptive-batch-scheduler-min-parallelism): The lower bound of
allowed parallelism to set adaptively
-- [`jobmanager.adaptive-batch-scheduler.max-parallelism`]({{< ref
"docs/deployment/config"
>}}#jobmanager-adaptive-batch-scheduler-max-parallelism): The upper bound of
allowed parallelism to set adaptively
-- [`jobmanager.adaptive-batch-scheduler.avg-data-volume-per-task`]({{< ref
"docs/deployment/config"
>}}#jobmanager-adaptive-batch-scheduler-avg-data-volume-per-task): The average
size of data volume to expect each task instance to process
+- [`jobmanager.adaptive-batch-scheduler.min-parallelism`]({{< ref
"docs/deployment/config"
>}}#jobmanager-adaptive-batch-scheduler-min-parallelism): The lower bound of
allowed parallelism to set adaptively. Currently, this option should be
configured as a power of 2, otherwise it will also be rounded up to a power of
2 automatically.
Review comment:
it will also be -> it will be
##########
File path: docs/content/docs/deployment/elastic_scaling.md
##########
@@ -153,7 +153,7 @@ The behavior of Adaptive Scheduler is configured by [all
configuration options c
## Adaptive Batch Scheduler
-The Adaptive Batch Scheduler can automatically decide parallelisms of
operators for batch jobs. If an operator is not set with a parallelism, the
scheduler will decide parallelism for it according to the size of its consumed
datasets. This can bring many benefits:
+The Adaptive Batch Scheduler can automatically decide parallelisms of
operators for batch jobs. If an operator is not set with a parallelism, the
scheduler will decide parallelism for it according to the size of its consumed
datasets (Note that the decided parallelism can only be a power of 2, see ["The
decided parallelism can only be a power of 2"](#limitations-2) for details).
This can bring many benefits:
Review comment:
can only -> will
##########
File path: docs/content/docs/deployment/elastic_scaling.md
##########
@@ -191,6 +191,8 @@ Adaptive Batch Scheduler will only decide parallelism for
operators whose parall
- **Batch jobs only**: Adaptive Batch Scheduler only supports batch jobs.
Exception will be thrown if a streaming job is submitted.
- **ALL-EXCHANGES-BLOCKING jobs only**: At the moment, Adaptive Batch
Scheduler only supports jobs whose [shuffle mode]({{< ref
"docs/deployment/config" >}}#execution-batch-shuffle-mode) is
`ALL-EXCHANGES-BLOCKING`.
+- **The decided parallelism can only be a power of 2**: In order to make the
subpartitoins evenly consumed by downstream tasks, user should configure the
[`jobmanager.adaptive-batch-scheduler.max-parallelism`]({{< ref
"docs/deployment/config"
>}}#jobmanager-adaptive-batch-scheduler-max-parallelism) to be a power of 2
(2^N), and the decided parallelism will also be a power of 2 (2^M and M < N).
Review comment:
subpartitoins -> subpartitions
##########
File path: docs/content/docs/deployment/elastic_scaling.md
##########
@@ -191,6 +191,8 @@ Adaptive Batch Scheduler will only decide parallelism for
operators whose parall
- **Batch jobs only**: Adaptive Batch Scheduler only supports batch jobs.
Exception will be thrown if a streaming job is submitted.
- **ALL-EXCHANGES-BLOCKING jobs only**: At the moment, Adaptive Batch
Scheduler only supports jobs whose [shuffle mode]({{< ref
"docs/deployment/config" >}}#execution-batch-shuffle-mode) is
`ALL-EXCHANGES-BLOCKING`.
+- **The decided parallelism can only be a power of 2**: In order to make the
subpartitoins evenly consumed by downstream tasks, user should configure the
[`jobmanager.adaptive-batch-scheduler.max-parallelism`]({{< ref
"docs/deployment/config"
>}}#jobmanager-adaptive-batch-scheduler-max-parallelism) to be a power of 2
(2^N), and the decided parallelism will also be a power of 2 (2^M and M < N).
+- **No support for serveral file APIs**: No support for
`StreamExecutionEnvironment#readFile` `StreamExecutionEnvironment#readTextFile`
`StreamExecutionEnvironment#createInput(FileInputFormat, ...)` and all data
sources using these APIs. When using these APIs, there will be a separate
monitoring task (called a `Custom File Source`) as a predecessor to the actual
data sources, which Adaptive Batch Scheduler cannot handle.
Review comment:
serveral -> several
##########
File path: docs/content/docs/deployment/elastic_scaling.md
##########
@@ -191,6 +191,8 @@ Adaptive Batch Scheduler will only decide parallelism for
operators whose parall
- **Batch jobs only**: Adaptive Batch Scheduler only supports batch jobs.
Exception will be thrown if a streaming job is submitted.
- **ALL-EXCHANGES-BLOCKING jobs only**: At the moment, Adaptive Batch
Scheduler only supports jobs whose [shuffle mode]({{< ref
"docs/deployment/config" >}}#execution-batch-shuffle-mode) is
`ALL-EXCHANGES-BLOCKING`.
+- **The decided parallelism can only be a power of 2**: In order to make the
subpartitoins evenly consumed by downstream tasks, user should configure the
[`jobmanager.adaptive-batch-scheduler.max-parallelism`]({{< ref
"docs/deployment/config"
>}}#jobmanager-adaptive-batch-scheduler-max-parallelism) to be a power of 2
(2^N), and the decided parallelism will also be a power of 2 (2^M and M < N).
Review comment:
> user should configure the XXX ...
maybe "the configuration XXX should be set to be a power of 2"
##########
File path: docs/content/docs/deployment/elastic_scaling.md
##########
@@ -191,6 +191,8 @@ Adaptive Batch Scheduler will only decide parallelism for
operators whose parall
- **Batch jobs only**: Adaptive Batch Scheduler only supports batch jobs.
Exception will be thrown if a streaming job is submitted.
- **ALL-EXCHANGES-BLOCKING jobs only**: At the moment, Adaptive Batch
Scheduler only supports jobs whose [shuffle mode]({{< ref
"docs/deployment/config" >}}#execution-batch-shuffle-mode) is
`ALL-EXCHANGES-BLOCKING`.
+- **The decided parallelism can only be a power of 2**: In order to make the
subpartitoins evenly consumed by downstream tasks, user should configure the
[`jobmanager.adaptive-batch-scheduler.max-parallelism`]({{< ref
"docs/deployment/config"
>}}#jobmanager-adaptive-batch-scheduler-max-parallelism) to be a power of 2
(2^N), and the decided parallelism will also be a power of 2 (2^M and M < N).
Review comment:
> In order to make the subpartitoins evenly consumed by downstream tasks
maybe "In order to ensure downstream tasks to consume the same count of
subpartitions"?
##########
File path: docs/content/docs/deployment/elastic_scaling.md
##########
@@ -191,6 +191,8 @@ Adaptive Batch Scheduler will only decide parallelism for
operators whose parall
- **Batch jobs only**: Adaptive Batch Scheduler only supports batch jobs.
Exception will be thrown if a streaming job is submitted.
- **ALL-EXCHANGES-BLOCKING jobs only**: At the moment, Adaptive Batch
Scheduler only supports jobs whose [shuffle mode]({{< ref
"docs/deployment/config" >}}#execution-batch-shuffle-mode) is
`ALL-EXCHANGES-BLOCKING`.
+- **The decided parallelism can only be a power of 2**: In order to make the
subpartitoins evenly consumed by downstream tasks, user should configure the
[`jobmanager.adaptive-batch-scheduler.max-parallelism`]({{< ref
"docs/deployment/config"
>}}#jobmanager-adaptive-batch-scheduler-max-parallelism) to be a power of 2
(2^N), and the decided parallelism will also be a power of 2 (2^M and M < N).
Review comment:
`M < N` -> `M <= N`
##########
File path: docs/content/docs/deployment/elastic_scaling.md
##########
@@ -191,6 +191,8 @@ Adaptive Batch Scheduler will only decide parallelism for
operators whose parall
- **Batch jobs only**: Adaptive Batch Scheduler only supports batch jobs.
Exception will be thrown if a streaming job is submitted.
- **ALL-EXCHANGES-BLOCKING jobs only**: At the moment, Adaptive Batch
Scheduler only supports jobs whose [shuffle mode]({{< ref
"docs/deployment/config" >}}#execution-batch-shuffle-mode) is
`ALL-EXCHANGES-BLOCKING`.
+- **The decided parallelism can only be a power of 2**: In order to make the
subpartitoins evenly consumed by downstream tasks, user should configure the
[`jobmanager.adaptive-batch-scheduler.max-parallelism`]({{< ref
"docs/deployment/config"
>}}#jobmanager-adaptive-batch-scheduler-max-parallelism) to be a power of 2
(2^N), and the decided parallelism will also be a power of 2 (2^M and M < N).
+- **No support for serveral file APIs**: No support for
`StreamExecutionEnvironment#readFile` `StreamExecutionEnvironment#readTextFile`
`StreamExecutionEnvironment#createInput(FileInputFormat, ...)` and all data
sources using these APIs. When using these APIs, there will be a separate
monitoring task (called a `Custom File Source`) as a predecessor to the actual
data sources, which Adaptive Batch Scheduler cannot handle.
Review comment:
> No support for serveral file APIs
maybe "FileInputFormat sources are not supported", and later in the
description state that "FileInputFormat sources include
`StreamExecutionEnvironment#readFile(...)`,
`StreamExecutionEnvironment#readTextFile(...)`
and`StreamExecutionEnvironment#createInput(FileInputFormat, ...)`"
##########
File path: docs/content/docs/deployment/elastic_scaling.md
##########
@@ -191,6 +191,8 @@ Adaptive Batch Scheduler will only decide parallelism for
operators whose parall
- **Batch jobs only**: Adaptive Batch Scheduler only supports batch jobs.
Exception will be thrown if a streaming job is submitted.
- **ALL-EXCHANGES-BLOCKING jobs only**: At the moment, Adaptive Batch
Scheduler only supports jobs whose [shuffle mode]({{< ref
"docs/deployment/config" >}}#execution-batch-shuffle-mode) is
`ALL-EXCHANGES-BLOCKING`.
+- **The decided parallelism can only be a power of 2**: In order to make the
subpartitoins evenly consumed by downstream tasks, user should configure the
[`jobmanager.adaptive-batch-scheduler.max-parallelism`]({{< ref
"docs/deployment/config"
>}}#jobmanager-adaptive-batch-scheduler-max-parallelism) to be a power of 2
(2^N), and the decided parallelism will also be a power of 2 (2^M and M < N).
+- **No support for serveral file APIs**: No support for
`StreamExecutionEnvironment#readFile` `StreamExecutionEnvironment#readTextFile`
`StreamExecutionEnvironment#createInput(FileInputFormat, ...)` and all data
sources using these APIs. When using these APIs, there will be a separate
monitoring task (called a `Custom File Source`) as a predecessor to the actual
data sources, which Adaptive Batch Scheduler cannot handle.
Review comment:
> When using these APIs, there will be a separate monitoring task
(called a `Custom File Source`) as a predecessor to the actual data sources,
which Adaptive Batch Scheduler cannot handle.
Maybe "FileInputFormat sources implementation requires the parallelism to be
decided in ahead, which is against the rule of Adaptive Batch Scheduler."?
##########
File path: docs/content/docs/deployment/elastic_scaling.md
##########
@@ -191,6 +191,8 @@ Adaptive Batch Scheduler will only decide parallelism for
operators whose parall
- **Batch jobs only**: Adaptive Batch Scheduler only supports batch jobs.
Exception will be thrown if a streaming job is submitted.
- **ALL-EXCHANGES-BLOCKING jobs only**: At the moment, Adaptive Batch
Scheduler only supports jobs whose [shuffle mode]({{< ref
"docs/deployment/config" >}}#execution-batch-shuffle-mode) is
`ALL-EXCHANGES-BLOCKING`.
+- **The decided parallelism can only be a power of 2**: In order to make the
subpartitoins evenly consumed by downstream tasks, user should configure the
[`jobmanager.adaptive-batch-scheduler.max-parallelism`]({{< ref
"docs/deployment/config"
>}}#jobmanager-adaptive-batch-scheduler-max-parallelism) to be a power of 2
(2^N), and the decided parallelism will also be a power of 2 (2^M and M < N).
+- **No support for serveral file APIs**: No support for
`StreamExecutionEnvironment#readFile` `StreamExecutionEnvironment#readTextFile`
`StreamExecutionEnvironment#createInput(FileInputFormat, ...)` and all data
sources using these APIs. When using these APIs, there will be a separate
monitoring task (called a `Custom File Source`) as a predecessor to the actual
data sources, which Adaptive Batch Scheduler cannot handle.
Review comment:
I would also point users to the new sources
(`StreamExecutionEnvironment#fromSource(...)`) or `flink-connector-files`(with
a link) for file reading when using Adaptive Batch Scheduler.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]