[ https://issues.apache.org/jira/browse/FLINK-35426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17848801#comment-17848801 ]
Zhu Zhu commented on FLINK-35426: --------------------------------- Good point! [~xiasun] The task is assigned to you. Feel free to open a pr for it. > Change the distribution of DynamicFilteringDataCollector to Broadcast > --------------------------------------------------------------------- > > Key: FLINK-35426 > URL: https://issues.apache.org/jira/browse/FLINK-35426 > Project: Flink > Issue Type: Improvement > Components: Table SQL / Planner > Affects Versions: 1.20.0 > Reporter: xingbe > Assignee: xingbe > Priority: Major > Fix For: 1.20.0 > > > Currently, the DynamicFilteringDataCollector is utilized in the dynamic > partition pruning feature of batch jobs to collect the partition information > dynamically filtered by the source. Its current data distribution method is > rebalance, and it also acts as an upstream vertex to the probe side Source. > Presently, when the Scheduler dynamically infers the parallelism for vertices > that are both downstream and Source, it considers factors from both sides, > which can lead to an overestimation of parallelism due to > DynamicFilteringDataCollector being an upstream of the Source. We aim to > change the distribution method of the DynamicFilteringDataCollector to > broadcast to prevent the dynamic overestimation of Source parallelism. > Furthermore, given that the DynamicFilteringDataCollector transmits data > through the OperatorCoordinator rather than through normal data distribution, > this change will not affect the DPP (Dynamic Partition Pruning) functionality. -- This message was sent by Atlassian Jira (v8.20.10#820010)