[
https://issues.apache.org/jira/browse/FLINK-35426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17848800#comment-17848800
]
xingbe commented on FLINK-35426:
--------------------------------
[~zhuzh] Could you please assign this ticket to me? Thanks.
> Change the distribution of DynamicFilteringDataCollector to Broadcast
> ---------------------------------------------------------------------
>
> Key: FLINK-35426
> URL: https://issues.apache.org/jira/browse/FLINK-35426
> Project: Flink
> Issue Type: Improvement
> Components: Table SQL / Planner
> Affects Versions: 1.20.0
> Reporter: xingbe
> Priority: Major
> Fix For: 1.20.0
>
>
> Currently, the DynamicFilteringDataCollector is utilized in the dynamic
> partition pruning feature of batch jobs to collect the partition information
> dynamically filtered by the source. Its current data distribution method is
> rebalance, and it also acts as an upstream vertex to the probe side Source.
> Presently, when the Scheduler dynamically infers the parallelism for vertices
> that are both downstream and Source, it considers factors from both sides,
> which can lead to an overestimation of parallelism due to
> DynamicFilteringDataCollector being an upstream of the Source. We aim to
> change the distribution method of the DynamicFilteringDataCollector to
> broadcast to prevent the dynamic overestimation of Source parallelism.
> Furthermore, given that the DynamicFilteringDataCollector transmits data
> through the OperatorCoordinator rather than through normal data distribution,
> this change will not affect the DPP (Dynamic Partition Pruning) functionality.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)