xingbe created FLINK-35426:
------------------------------
Summary: Change the distribution of DynamicFilteringDataCollector
to Broadcast
Key: FLINK-35426
URL: https://issues.apache.org/jira/browse/FLINK-35426
Project: Flink
Issue Type: Improvement
Components: Table SQL / Planner
Affects Versions: 1.20.0
Reporter: xingbe
Fix For: 1.20.0
Currently, the DynamicFilteringDataCollector is utilized in the dynamic
partition pruning feature of batch jobs to collect the partition information
dynamically filtered by the source. Its current data distribution method is
rebalance, and it also acts as an upstream vertex to the probe side Source.
Presently, when the Scheduler dynamically infers the parallelism for vertices
that are both downstream and Source, it considers factors from both sides,
which can lead to an overestimation of parallelism due to
DynamicFilteringDataCollector being an upstream of the Source. We aim to change
the distribution method of the DynamicFilteringDataCollector to broadcast to
prevent the dynamic overestimation of Source parallelism.
Furthermore, given that the DynamicFilteringDataCollector transmits data
through the OperatorCoordinator rather than through normal data distribution,
this change will not affect the DPP (Dynamic Partition Pruning) functionality.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)