SinBex opened a new pull request, #24830:
URL: https://github.com/apache/flink/pull/24830
## What is the purpose of the change
Currently, the DynamicFilteringDataCollector is utilized in the dynamic
partition pruning feature of batch jobs to collect the partition information
dynamically filtered by the source. Its current data distribution method is
rebalance, and it also acts as an upstream vertex to the probe side Source.
Presently, when the Scheduler dynamically infers the parallelism for
vertices that are both downstream and Source, it considers factors from both
sides, which can lead to an overestimation of parallelism due to
DynamicFilteringDataCollector being an upstream of the Source. We aim to change
the distribution method of the DynamicFilteringDataCollector to broadcast to
prevent the dynamic overestimation of Source parallelism.
Furthermore, given that the DynamicFilteringDataCollector transmits data
through the OperatorCoordinator rather than through normal data distribution,
this change will not affect the DPP (Dynamic Partition Pruning) functionality.
## Brief change log
- *Change the distribution of DynamicFilteringDataCollector to Broadcast*
## Verifying this change
This change is a trivial rework / code cleanup without any test coverage.
## Does this pull request potentially affect one of the following parts:
- Dependencies (does it add or upgrade a dependency): ( no)
- The public API, i.e., is any changed class annotated with
`@Public(Evolving)`: (no)
- The serializers: (no )
- The runtime per-record code paths (performance sensitive): ( no )
- Anything that affects deployment or recovery: JobManager (and its
components), Checkpointing, Kubernetes/Yarn, ZooKeeper: (no )
- The S3 file system connector: (no)
## Documentation
- Does this pull request introduce a new feature? (no)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]