Thesharing opened a new pull request #15314:
URL: https://github.com/apache/flink/pull/15314


   ## What is the purpose of the change
   
   *This pull request introduce the optimization of releasing result partitions 
in RegionPartitionReleaseStrategy.*
   *RegionPartitionReleaseStrategy is responsible for releasing result 
partitions when all the downstream tasks finish.*
   
   *The current implementation is:*
   ```
   for each consumed SchedulingResultPartition of current finished 
SchedulingPipelinedRegion:
     for each consumer SchedulingPipelinedRegion of the 
SchedulingResultPartition:
       if all the regions are finished:
         release the partitions
   ```
   
   *The time complexity of releasing a result partition is O(N^2). However, 
considering that during the entire stage, all the result partitions need to be 
released, the time complexity is actually O(N^3).*
   
   *Based on FLINK-21228 and FLINK-21330, the consumed result partitions of a 
pipelined region are grouped. Since the result partitions in one group are 
isomorphic, we can just cache the finished status of the pipeline regions and 
the fully consumed status of result partition groups.*
   
   *The optimized implementation is:*
   ```
   for each ConsumedPartitionGroup of current finished 
SchedulingPipelinedRegion:
     if all consumer SchedulingPipelinedRegion of the ConsumedPartitionGroup 
are finished:
       set the ConsumePartitionGroup to be fully consumed
       for result partition in the ConsumePartitionGroup:
         if all the ConsumePartitionGroups it belongs to are fully consumed:
           release the result partition
   ```
   
   *After the optimization, the complexity decreases from O(N^3) to O(N).*
   
   *For more details, please check FLINK-21332.*
   
   ## Brief change log
   
     - *Optimize RegionPartitionReleaseStrategy#filterReleasablePartitions*
   
   
   ## Verifying this change
   
   *Since this optimization does not change the original logic of eleasing 
result partitions in RegionPartitionReleaseStrategy, we believe that this 
change is already covered by RegionPartitionReleaseStrategyTest.*
   
   ## Does this pull request potentially affect one of the following parts:
   
     - Dependencies (does it add or upgrade a dependency): (yes / **no**)
     - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: (yes / **no**)
     - The serializers: (yes / **no** / don't know)
     - The runtime per-record code paths (performance sensitive): (yes / **no** 
/ don't know)
     - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Kubernetes/Yarn/Mesos, ZooKeeper: (**yes** / no / 
don't know)
     - The S3 file system connector: (yes / **no** / don't know)
   
   ## Documentation
   
     - Does this pull request introduce a new feature? (yes / **no**)
     - If yes, how is the feature documented? (**not applicable** / docs / 
JavaDocs / not documented)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to