Venkata krishnan Sowrirajan created FLINK-35165:
---------------------------------------------------

             Summary: AdaptiveBatch Scheduler should not restrict the default 
source parallelism to the max parallelism set
                 Key: FLINK-35165
                 URL: https://issues.apache.org/jira/browse/FLINK-35165
             Project: Flink
          Issue Type: Bug
          Components: Runtime / Coordination
            Reporter: Venkata krishnan Sowrirajan


Copy-pasting the reasoning mentioned on this [discussion 
thread|https://lists.apache.org/thread/o887xhvvmn2rg5tyymw348yl2mqt23o7].

Let me state why I think 
"{_}jobmanager.adaptive-batch-scheduler.default-source-parallelism{_}" should 
not be bound by the "{_}jobmanager.adaptive-batch-scheduler.max-parallelism{_}".
 *  Source vertex is unique and does not have any upstream vertices - 
Downstream vertices read shuffled data partitioned by key, which is not the 
case for the Source vertex
 * Limiting source parallelism by downstream vertices' max parallelism is 
incorrect
 * If we say for ""semantic consistency" the source vertex parallelism has to 
be bound by the overall job's max parallelism, it can lead to following issues:
 ** High filter selectivity with huge amounts of data to read
 ** Setting high "*jobmanager.adaptive-batch-scheduler.max-parallelism*" so 
that source parallelism can be set higher can lead to small blocks and 
sub-optimal performance.
 ** Setting high "*jobmanager.adaptive-batch-scheduler.max-parallelism*" 
requires careful tuning of network buffer configurations which is unnecessary 
in cases where it is not required just so that the source parallelism can be 
set high.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to