xingbe created FLINK-35426:
------------------------------

             Summary: Change the distribution of DynamicFilteringDataCollector 
to Broadcast
                 Key: FLINK-35426
                 URL: https://issues.apache.org/jira/browse/FLINK-35426
             Project: Flink
          Issue Type: Improvement
          Components: Table SQL / Planner
    Affects Versions: 1.20.0
            Reporter: xingbe
             Fix For: 1.20.0


Currently, the DynamicFilteringDataCollector is utilized in the dynamic 
partition pruning feature of batch jobs to collect the partition information 
dynamically filtered by the source. Its current data distribution method is 
rebalance, and it also acts as an upstream vertex to the probe side Source.

Presently, when the Scheduler dynamically infers the parallelism for vertices 
that are both downstream and Source, it considers factors from both sides, 
which can lead to an overestimation of parallelism due to 
DynamicFilteringDataCollector being an upstream of the Source. We aim to change 
the distribution method of the DynamicFilteringDataCollector to broadcast to 
prevent the dynamic overestimation of Source parallelism.

Furthermore, given that the DynamicFilteringDataCollector transmits data 
through the OperatorCoordinator rather than through normal data distribution, 
this change will not affect the DPP (Dynamic Partition Pruning) functionality.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to