[jira] [Commented] (FLINK-31706) The default source parallelism should be the same as execution's default parallelism under adaptive batch scheduler

Lijie Wang (Jira) Thu, 11 May 2023 06:38:18 -0700


    [ 
https://issues.apache.org/jira/browse/FLINK-31706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17721776#comment-17721776
 ]


Lijie Wang commented on FLINK-31706:
------------------------------------

I think it's a good idea to use {{paralleism.default}} instread of the 
{{execution.batch.adaptive.auto-parallelism.default-source-parallelism}}.

Regarding the parallelism of Source in the adaptive batch scheduler, we also 
have some other ideas/actions in plan: dynamically infer the Source paralleism 
at runtime (according to the amount of data that Source actually needs to read 
after Dynamic Partition Pruning). One possible way is that the source 
coordinator can infer the parallelism based on the splits information actually 
consumed.

At that time, if the parallelism of Source are not specified by the user, the 
source coorinator will be responseible for inferring the parallelism 
automatically(if it supports). If the Source does not support inferring 
parallelism automatically, {{parallelism.default}} will be used as the 
parallelism of the Source. (An initial thought :))

> The default source parallelism should be the same as execution's default 
> parallelism under adaptive batch scheduler
> -------------------------------------------------------------------------------------------------------------------
>
>                 Key: FLINK-31706
>                 URL: https://issues.apache.org/jira/browse/FLINK-31706
>             Project: Flink
>          Issue Type: Improvement
>          Components: Runtime / Coordination
>            Reporter: Yun Tang
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 1.18.0
>
>
> Currently, the sources need to set 
> {{execution.batch.adaptive.auto-parallelism.default-source-parallelism }} in 
> the adaptive batch scheduler mode, otherwise, the source parallelism is only 
> 1 by default. A better solution might be set as the default execution 
> parallelism if no user configured. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (FLINK-31706) The default source parallelism should be the same as execution's default parallelism under adaptive batch scheduler

Reply via email to