SinBex opened a new pull request, #24830:
URL: https://github.com/apache/flink/pull/24830

   ## What is the purpose of the change
   
   Currently, the DynamicFilteringDataCollector is utilized in the dynamic 
partition pruning feature of batch jobs to collect the partition information 
dynamically filtered by the source. Its current data distribution method is 
rebalance, and it also acts as an upstream vertex to the probe side Source.
   
   Presently, when the Scheduler dynamically infers the parallelism for 
vertices that are both downstream and Source, it considers factors from both 
sides, which can lead to an overestimation of parallelism due to 
DynamicFilteringDataCollector being an upstream of the Source. We aim to change 
the distribution method of the DynamicFilteringDataCollector to broadcast to 
prevent the dynamic overestimation of Source parallelism.
   
   Furthermore, given that the DynamicFilteringDataCollector transmits data 
through the OperatorCoordinator rather than through normal data distribution, 
this change will not affect the DPP (Dynamic Partition Pruning) functionality.
   
   
   ## Brief change log
   
     - *Change the distribution of DynamicFilteringDataCollector to Broadcast*
   
   
   ## Verifying this change
   
   This change is a trivial rework / code cleanup without any test coverage.
   
   ## Does this pull request potentially affect one of the following parts:
   
     - Dependencies (does it add or upgrade a dependency): ( no)
     - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: (no)
     - The serializers: (no )
     - The runtime per-record code paths (performance sensitive): ( no )
     - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Kubernetes/Yarn, ZooKeeper: (no )
     - The S3 file system connector: (no)
   
   ## Documentation
   
     - Does this pull request introduce a new feature? (no)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to