[
https://issues.apache.org/jira/browse/FLINK-35165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
ASF GitHub Bot updated FLINK-35165:
-----------------------------------
Labels: pull-request-available (was: )
> AdaptiveBatch Scheduler should not restrict the default source parallelism to
> the max parallelism set
> -----------------------------------------------------------------------------------------------------
>
> Key: FLINK-35165
> URL: https://issues.apache.org/jira/browse/FLINK-35165
> Project: Flink
> Issue Type: Bug
> Components: Runtime / Coordination
> Reporter: Venkata krishnan Sowrirajan
> Priority: Major
> Labels: pull-request-available
>
> Copy-pasting the reasoning mentioned on this [discussion
> thread|https://lists.apache.org/thread/o887xhvvmn2rg5tyymw348yl2mqt23o7].
> Let me state why I think
> "{_}jobmanager.adaptive-batch-scheduler.default-source-parallelism{_}" should
> not be bound by the
> "{_}jobmanager.adaptive-batch-scheduler.max-parallelism{_}".
> * Source vertex is unique and does not have any upstream vertices -
> Downstream vertices read shuffled data partitioned by key, which is not the
> case for the Source vertex
> * Limiting source parallelism by downstream vertices' max parallelism is
> incorrect
> * If we say for ""semantic consistency" the source vertex parallelism has to
> be bound by the overall job's max parallelism, it can lead to following
> issues:
> ** High filter selectivity with huge amounts of data to read
> ** Setting high "*jobmanager.adaptive-batch-scheduler.max-parallelism*" so
> that source parallelism can be set higher can lead to small blocks and
> sub-optimal performance.
> ** Setting high "*jobmanager.adaptive-batch-scheduler.max-parallelism*"
> requires careful tuning of network buffer configurations which is unnecessary
> in cases where it is not required just so that the source parallelism can be
> set high.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)