zhuzhurk commented on a change in pull request #18757:
URL: https://github.com/apache/flink/pull/18757#discussion_r829715509



##########
File path: docs/content/docs/deployment/elastic_scaling.md
##########
@@ -170,9 +170,9 @@ To use Adaptive Batch Scheduler, you need to:
 - Leave the [`execution.batch-shuffle-mode`]({{< ref "docs/deployment/config" 
>}}#execution-batch-shuffle-mode) unset or explicitly set it to 
`ALL-EXCHANGES-BLOCKING` (default value) due to ["ALL-EXCHANGES-BLOCKING jobs 
only"](#limitations-2).
 
 In addition, there are several related configuration options that may need 
adjustment when using Adaptive Batch Scheduler:
-- [`jobmanager.adaptive-batch-scheduler.min-parallelism`]({{< ref 
"docs/deployment/config" 
>}}#jobmanager-adaptive-batch-scheduler-min-parallelism): The lower bound of 
allowed parallelism to set adaptively
-- [`jobmanager.adaptive-batch-scheduler.max-parallelism`]({{< ref 
"docs/deployment/config" 
>}}#jobmanager-adaptive-batch-scheduler-max-parallelism): The upper bound of 
allowed parallelism to set adaptively
-- [`jobmanager.adaptive-batch-scheduler.avg-data-volume-per-task`]({{< ref 
"docs/deployment/config" 
>}}#jobmanager-adaptive-batch-scheduler-avg-data-volume-per-task): The average 
size of data volume to expect each task instance to process
+- [`jobmanager.adaptive-batch-scheduler.min-parallelism`]({{< ref 
"docs/deployment/config" 
>}}#jobmanager-adaptive-batch-scheduler-min-parallelism): The lower bound of 
allowed parallelism to set adaptively. Currently, this option should be 
configured as a power of 2, otherwise it will also be rounded up to a power of 
2 automatically.

Review comment:
       it will also be -> it will be

##########
File path: docs/content/docs/deployment/elastic_scaling.md
##########
@@ -153,7 +153,7 @@ The behavior of Adaptive Scheduler is configured by [all 
configuration options c
 
 ## Adaptive Batch Scheduler
 
-The Adaptive Batch Scheduler can automatically decide parallelisms of 
operators for batch jobs. If an operator is not set with a parallelism, the 
scheduler will decide parallelism for it according to the size of its consumed 
datasets. This can bring many benefits:
+The Adaptive Batch Scheduler can automatically decide parallelisms of 
operators for batch jobs. If an operator is not set with a parallelism, the 
scheduler will decide parallelism for it according to the size of its consumed 
datasets (Note that the decided parallelism can only be a power of 2, see ["The 
decided parallelism can only be a power of 2"](#limitations-2) for details). 
This can bring many benefits:

Review comment:
       can only -> will

##########
File path: docs/content/docs/deployment/elastic_scaling.md
##########
@@ -191,6 +191,8 @@ Adaptive Batch Scheduler will only decide parallelism for 
operators whose parall
 
 - **Batch jobs only**: Adaptive Batch Scheduler only supports batch jobs. 
Exception will be thrown if a streaming job is submitted.
 - **ALL-EXCHANGES-BLOCKING jobs only**: At the moment, Adaptive Batch 
Scheduler only supports jobs whose [shuffle mode]({{< ref 
"docs/deployment/config" >}}#execution-batch-shuffle-mode) is 
`ALL-EXCHANGES-BLOCKING`.
+- **The decided parallelism can only be a power of 2**: In order to make the 
subpartitoins evenly consumed by downstream tasks, user should configure the 
[`jobmanager.adaptive-batch-scheduler.max-parallelism`]({{< ref 
"docs/deployment/config" 
>}}#jobmanager-adaptive-batch-scheduler-max-parallelism) to be a power of 2 
(2^N), and the decided parallelism will also be a power of 2 (2^M and M < N).

Review comment:
       subpartitoins -> subpartitions

##########
File path: docs/content/docs/deployment/elastic_scaling.md
##########
@@ -191,6 +191,8 @@ Adaptive Batch Scheduler will only decide parallelism for 
operators whose parall
 
 - **Batch jobs only**: Adaptive Batch Scheduler only supports batch jobs. 
Exception will be thrown if a streaming job is submitted.
 - **ALL-EXCHANGES-BLOCKING jobs only**: At the moment, Adaptive Batch 
Scheduler only supports jobs whose [shuffle mode]({{< ref 
"docs/deployment/config" >}}#execution-batch-shuffle-mode) is 
`ALL-EXCHANGES-BLOCKING`.
+- **The decided parallelism can only be a power of 2**: In order to make the 
subpartitoins evenly consumed by downstream tasks, user should configure the 
[`jobmanager.adaptive-batch-scheduler.max-parallelism`]({{< ref 
"docs/deployment/config" 
>}}#jobmanager-adaptive-batch-scheduler-max-parallelism) to be a power of 2 
(2^N), and the decided parallelism will also be a power of 2 (2^M and M < N).
+- **No support for serveral file APIs**: No support for 
`StreamExecutionEnvironment#readFile` `StreamExecutionEnvironment#readTextFile` 
`StreamExecutionEnvironment#createInput(FileInputFormat, ...)` and all data 
sources using these APIs. When using these APIs, there will be a separate 
monitoring task (called a `Custom File Source`) as a predecessor to the actual 
data sources, which Adaptive Batch Scheduler cannot handle.

Review comment:
       serveral -> several

##########
File path: docs/content/docs/deployment/elastic_scaling.md
##########
@@ -191,6 +191,8 @@ Adaptive Batch Scheduler will only decide parallelism for 
operators whose parall
 
 - **Batch jobs only**: Adaptive Batch Scheduler only supports batch jobs. 
Exception will be thrown if a streaming job is submitted.
 - **ALL-EXCHANGES-BLOCKING jobs only**: At the moment, Adaptive Batch 
Scheduler only supports jobs whose [shuffle mode]({{< ref 
"docs/deployment/config" >}}#execution-batch-shuffle-mode) is 
`ALL-EXCHANGES-BLOCKING`.
+- **The decided parallelism can only be a power of 2**: In order to make the 
subpartitoins evenly consumed by downstream tasks, user should configure the 
[`jobmanager.adaptive-batch-scheduler.max-parallelism`]({{< ref 
"docs/deployment/config" 
>}}#jobmanager-adaptive-batch-scheduler-max-parallelism) to be a power of 2 
(2^N), and the decided parallelism will also be a power of 2 (2^M and M < N).

Review comment:
       > user should configure the XXX ...
   
   maybe "the configuration XXX should be set to be a power of 2"

##########
File path: docs/content/docs/deployment/elastic_scaling.md
##########
@@ -191,6 +191,8 @@ Adaptive Batch Scheduler will only decide parallelism for 
operators whose parall
 
 - **Batch jobs only**: Adaptive Batch Scheduler only supports batch jobs. 
Exception will be thrown if a streaming job is submitted.
 - **ALL-EXCHANGES-BLOCKING jobs only**: At the moment, Adaptive Batch 
Scheduler only supports jobs whose [shuffle mode]({{< ref 
"docs/deployment/config" >}}#execution-batch-shuffle-mode) is 
`ALL-EXCHANGES-BLOCKING`.
+- **The decided parallelism can only be a power of 2**: In order to make the 
subpartitoins evenly consumed by downstream tasks, user should configure the 
[`jobmanager.adaptive-batch-scheduler.max-parallelism`]({{< ref 
"docs/deployment/config" 
>}}#jobmanager-adaptive-batch-scheduler-max-parallelism) to be a power of 2 
(2^N), and the decided parallelism will also be a power of 2 (2^M and M < N).

Review comment:
       > In order to make the subpartitoins evenly consumed by downstream tasks
   
   maybe "In order to ensure downstream tasks to consume the same count of 
subpartitions"?

##########
File path: docs/content/docs/deployment/elastic_scaling.md
##########
@@ -191,6 +191,8 @@ Adaptive Batch Scheduler will only decide parallelism for 
operators whose parall
 
 - **Batch jobs only**: Adaptive Batch Scheduler only supports batch jobs. 
Exception will be thrown if a streaming job is submitted.
 - **ALL-EXCHANGES-BLOCKING jobs only**: At the moment, Adaptive Batch 
Scheduler only supports jobs whose [shuffle mode]({{< ref 
"docs/deployment/config" >}}#execution-batch-shuffle-mode) is 
`ALL-EXCHANGES-BLOCKING`.
+- **The decided parallelism can only be a power of 2**: In order to make the 
subpartitoins evenly consumed by downstream tasks, user should configure the 
[`jobmanager.adaptive-batch-scheduler.max-parallelism`]({{< ref 
"docs/deployment/config" 
>}}#jobmanager-adaptive-batch-scheduler-max-parallelism) to be a power of 2 
(2^N), and the decided parallelism will also be a power of 2 (2^M and M < N).

Review comment:
       `M < N` -> `M <= N`

##########
File path: docs/content/docs/deployment/elastic_scaling.md
##########
@@ -191,6 +191,8 @@ Adaptive Batch Scheduler will only decide parallelism for 
operators whose parall
 
 - **Batch jobs only**: Adaptive Batch Scheduler only supports batch jobs. 
Exception will be thrown if a streaming job is submitted.
 - **ALL-EXCHANGES-BLOCKING jobs only**: At the moment, Adaptive Batch 
Scheduler only supports jobs whose [shuffle mode]({{< ref 
"docs/deployment/config" >}}#execution-batch-shuffle-mode) is 
`ALL-EXCHANGES-BLOCKING`.
+- **The decided parallelism can only be a power of 2**: In order to make the 
subpartitoins evenly consumed by downstream tasks, user should configure the 
[`jobmanager.adaptive-batch-scheduler.max-parallelism`]({{< ref 
"docs/deployment/config" 
>}}#jobmanager-adaptive-batch-scheduler-max-parallelism) to be a power of 2 
(2^N), and the decided parallelism will also be a power of 2 (2^M and M < N).
+- **No support for serveral file APIs**: No support for 
`StreamExecutionEnvironment#readFile` `StreamExecutionEnvironment#readTextFile` 
`StreamExecutionEnvironment#createInput(FileInputFormat, ...)` and all data 
sources using these APIs. When using these APIs, there will be a separate 
monitoring task (called a `Custom File Source`) as a predecessor to the actual 
data sources, which Adaptive Batch Scheduler cannot handle.

Review comment:
       > No support for serveral file APIs
   
   maybe "FileInputFormat sources are not supported", and later in the 
description state that "FileInputFormat sources include 
`StreamExecutionEnvironment#readFile(...)`, 
`StreamExecutionEnvironment#readTextFile(...)` 
and`StreamExecutionEnvironment#createInput(FileInputFormat, ...)`"

##########
File path: docs/content/docs/deployment/elastic_scaling.md
##########
@@ -191,6 +191,8 @@ Adaptive Batch Scheduler will only decide parallelism for 
operators whose parall
 
 - **Batch jobs only**: Adaptive Batch Scheduler only supports batch jobs. 
Exception will be thrown if a streaming job is submitted.
 - **ALL-EXCHANGES-BLOCKING jobs only**: At the moment, Adaptive Batch 
Scheduler only supports jobs whose [shuffle mode]({{< ref 
"docs/deployment/config" >}}#execution-batch-shuffle-mode) is 
`ALL-EXCHANGES-BLOCKING`.
+- **The decided parallelism can only be a power of 2**: In order to make the 
subpartitoins evenly consumed by downstream tasks, user should configure the 
[`jobmanager.adaptive-batch-scheduler.max-parallelism`]({{< ref 
"docs/deployment/config" 
>}}#jobmanager-adaptive-batch-scheduler-max-parallelism) to be a power of 2 
(2^N), and the decided parallelism will also be a power of 2 (2^M and M < N).
+- **No support for serveral file APIs**: No support for 
`StreamExecutionEnvironment#readFile` `StreamExecutionEnvironment#readTextFile` 
`StreamExecutionEnvironment#createInput(FileInputFormat, ...)` and all data 
sources using these APIs. When using these APIs, there will be a separate 
monitoring task (called a `Custom File Source`) as a predecessor to the actual 
data sources, which Adaptive Batch Scheduler cannot handle.

Review comment:
       > When using these APIs, there will be a separate monitoring task 
(called a `Custom File Source`) as a predecessor to the actual data sources, 
which Adaptive Batch Scheduler cannot handle.
   
   Maybe "FileInputFormat sources implementation requires the parallelism to be 
decided in ahead, which is against the rule of Adaptive Batch Scheduler."?

##########
File path: docs/content/docs/deployment/elastic_scaling.md
##########
@@ -191,6 +191,8 @@ Adaptive Batch Scheduler will only decide parallelism for 
operators whose parall
 
 - **Batch jobs only**: Adaptive Batch Scheduler only supports batch jobs. 
Exception will be thrown if a streaming job is submitted.
 - **ALL-EXCHANGES-BLOCKING jobs only**: At the moment, Adaptive Batch 
Scheduler only supports jobs whose [shuffle mode]({{< ref 
"docs/deployment/config" >}}#execution-batch-shuffle-mode) is 
`ALL-EXCHANGES-BLOCKING`.
+- **The decided parallelism can only be a power of 2**: In order to make the 
subpartitoins evenly consumed by downstream tasks, user should configure the 
[`jobmanager.adaptive-batch-scheduler.max-parallelism`]({{< ref 
"docs/deployment/config" 
>}}#jobmanager-adaptive-batch-scheduler-max-parallelism) to be a power of 2 
(2^N), and the decided parallelism will also be a power of 2 (2^M and M < N).
+- **No support for serveral file APIs**: No support for 
`StreamExecutionEnvironment#readFile` `StreamExecutionEnvironment#readTextFile` 
`StreamExecutionEnvironment#createInput(FileInputFormat, ...)` and all data 
sources using these APIs. When using these APIs, there will be a separate 
monitoring task (called a `Custom File Source`) as a predecessor to the actual 
data sources, which Adaptive Batch Scheduler cannot handle.

Review comment:
       I would also point users to the new sources 
(`StreamExecutionEnvironment#fromSource(...)`) or `flink-connector-files`(with 
a link) for file reading when using Adaptive Batch Scheduler.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to