[ 
https://issues.apache.org/jira/browse/FLINK-35426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17848801#comment-17848801
 ] 

Zhu Zhu commented on FLINK-35426:
---------------------------------

Good point! [~xiasun]
The task is assigned to you. Feel free to open a pr for it.

> Change the distribution of DynamicFilteringDataCollector to Broadcast
> ---------------------------------------------------------------------
>
>                 Key: FLINK-35426
>                 URL: https://issues.apache.org/jira/browse/FLINK-35426
>             Project: Flink
>          Issue Type: Improvement
>          Components: Table SQL / Planner
>    Affects Versions: 1.20.0
>            Reporter: xingbe
>            Assignee: xingbe
>            Priority: Major
>             Fix For: 1.20.0
>
>
> Currently, the DynamicFilteringDataCollector is utilized in the dynamic 
> partition pruning feature of batch jobs to collect the partition information 
> dynamically filtered by the source. Its current data distribution method is 
> rebalance, and it also acts as an upstream vertex to the probe side Source.
> Presently, when the Scheduler dynamically infers the parallelism for vertices 
> that are both downstream and Source, it considers factors from both sides, 
> which can lead to an overestimation of parallelism due to 
> DynamicFilteringDataCollector being an upstream of the Source. We aim to 
> change the distribution method of the DynamicFilteringDataCollector to 
> broadcast to prevent the dynamic overestimation of Source parallelism.
> Furthermore, given that the DynamicFilteringDataCollector transmits data 
> through the OperatorCoordinator rather than through normal data distribution, 
> this change will not affect the DPP (Dynamic Partition Pruning) functionality.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to