[jira] [Updated] (FLINK-35165) AdaptiveBatch Scheduler should not restrict the default source parallelism to the max parallelism set

ASF GitHub Bot (Jira) Sun, 28 Apr 2024 11:00:11 -0700


     [ 
https://issues.apache.org/jira/browse/FLINK-35165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


ASF GitHub Bot updated FLINK-35165:
-----------------------------------
    Labels: pull-request-available  (was: )

> AdaptiveBatch Scheduler should not restrict the default source parallelism to 
> the max parallelism set
> -----------------------------------------------------------------------------------------------------
>
>                 Key: FLINK-35165
>                 URL: https://issues.apache.org/jira/browse/FLINK-35165
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / Coordination
>            Reporter: Venkata krishnan Sowrirajan
>            Priority: Major
>              Labels: pull-request-available
>
> Copy-pasting the reasoning mentioned on this [discussion 
> thread|https://lists.apache.org/thread/o887xhvvmn2rg5tyymw348yl2mqt23o7].
> Let me state why I think 
> "{_}jobmanager.adaptive-batch-scheduler.default-source-parallelism{_}" should 
> not be bound by the 
> "{_}jobmanager.adaptive-batch-scheduler.max-parallelism{_}".
>  *  Source vertex is unique and does not have any upstream vertices - 
> Downstream vertices read shuffled data partitioned by key, which is not the 
> case for the Source vertex
>  * Limiting source parallelism by downstream vertices' max parallelism is 
> incorrect
>  * If we say for ""semantic consistency" the source vertex parallelism has to 
> be bound by the overall job's max parallelism, it can lead to following 
> issues:
>  ** High filter selectivity with huge amounts of data to read
>  ** Setting high "*jobmanager.adaptive-batch-scheduler.max-parallelism*" so 
> that source parallelism can be set higher can lead to small blocks and 
> sub-optimal performance.
>  ** Setting high "*jobmanager.adaptive-batch-scheduler.max-parallelism*" 
> requires careful tuning of network buffer configurations which is unnecessary 
> in cases where it is not required just so that the source parallelism can be 
> set high.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (FLINK-35165) AdaptiveBatch Scheduler should not restrict the default source parallelism to the max parallelism set

Reply via email to