Venkata krishnan Sowrirajan created FLINK-35165:
---------------------------------------------------
Summary: AdaptiveBatch Scheduler should not restrict the default
source parallelism to the max parallelism set
Key: FLINK-35165
URL: https://issues.apache.org/jira/browse/FLINK-35165
Project: Flink
Issue Type: Bug
Components: Runtime / Coordination
Reporter: Venkata krishnan Sowrirajan
Copy-pasting the reasoning mentioned on this [discussion
thread|https://lists.apache.org/thread/o887xhvvmn2rg5tyymw348yl2mqt23o7].
Let me state why I think
"{_}jobmanager.adaptive-batch-scheduler.default-source-parallelism{_}" should
not be bound by the "{_}jobmanager.adaptive-batch-scheduler.max-parallelism{_}".
* Source vertex is unique and does not have any upstream vertices -
Downstream vertices read shuffled data partitioned by key, which is not the
case for the Source vertex
* Limiting source parallelism by downstream vertices' max parallelism is
incorrect
* If we say for ""semantic consistency" the source vertex parallelism has to
be bound by the overall job's max parallelism, it can lead to following issues:
** High filter selectivity with huge amounts of data to read
** Setting high "*jobmanager.adaptive-batch-scheduler.max-parallelism*" so
that source parallelism can be set higher can lead to small blocks and
sub-optimal performance.
** Setting high "*jobmanager.adaptive-batch-scheduler.max-parallelism*"
requires careful tuning of network buffer configurations which is unnecessary
in cases where it is not required just so that the source parallelism can be
set high.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)