Thesharing opened a new pull request #15310:
URL: https://github.com/apache/flink/pull/15310


   ## What is the purpose of the change
   
   *This pull request introduces the optimization of the initialization of 
PipelinedRegionSchedulingStrategy.*
   
   *PipelinedRegionSchedulingStrategy is used for task scheduling. The 
bottleneck of initializing PipelinedRegionSchedulingStrategy mainly lies in the 
procedure of calculating the consumed result partitions of the pipelined 
region, as well as the consumer pipelined region of the result partitions.*
   
   *For a batch job fulfilled with all-to-all blocking edges, each region 
consists of one vertex. The time complexity and space complexity both degrades 
to O(N^2).*
   
   *Based on FLINK-21328, the consumedResults in 
DefaultSchedulingPipelinedRegion can be replaced with ConsumedPartitionGroup in 
DefaultExecutionVertex.*
   
   *The complexity of initializing PipelinedRegionSchedulingStrategy decreases 
from O(N^2) to O(N).*
   
   *For more details, please check FLINK-21330.*
   
   
   ## Brief change log
   
     - *Add partitionNum for TestingSchedulingResultPartition*
     - *Add getConsumedPartitionGroups for SchedulingPipelinedRegion*
     - *Optimize the initialization of PipelinedRegionSchedulingStrategy*
     - *Optimize PipelinedRegionSchedulingStrategy#maybeScheduleRegion*
   
   
   ## Verifying this change
   
   *Since this optimization does not change the original logic of the 
initialization of PipelinedRegionSchedulingStrategy, we believe that this 
change is already covered by existing tests, such as 
PipelinedRegionSchedulingStrategyTest, DefaultExecutionTopologyTest, and etc.*
   
   ## Does this pull request potentially affect one of the following parts:
   
     - Dependencies (does it add or upgrade a dependency): (yes / **no**)
     - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: (yes / **no**)
     - The serializers: (yes / **no** / don't know)
     - The runtime per-record code paths (performance sensitive): (yes / **no** 
/ don't know)
     - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Kubernetes/Yarn/Mesos, ZooKeeper: (**yes** / no / 
don't know)
     - The S3 file system connector: (yes / **no** / don't know)
   
   ## Documentation
   
     - Does this pull request introduce a new feature? (yes / **no**)
     - If yes, how is the feature documented? (**not applicable** / docs / 
JavaDocs / not documented)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to