[ 
https://issues.apache.org/jira/browse/FLINK-21915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhilong Hong updated FLINK-21915:
---------------------------------
    Labels: pull-request-available  (was: auto-unassigned 
pull-request-available)

> Optimize Execution#finishPartitionsAndUpdateConsumers
> -----------------------------------------------------
>
>                 Key: FLINK-21915
>                 URL: https://issues.apache.org/jira/browse/FLINK-21915
>             Project: Flink
>          Issue Type: Sub-task
>          Components: Runtime / Coordination
>    Affects Versions: 1.13.0
>            Reporter: Zhilong Hong
>            Priority: Major
>              Labels: pull-request-available
>
> Based on the scheduler benchmark {{PartitionReleaseInBatchJobBenchmark}} 
> introduced in FLINK-20612, we find that there's another procedure that has 
> O(N^2) computation complexity: 
> {{Execution#finishPartitionsAndUpdateConsumers}}. 
> Once an execution is finished, it will finish all its BLOCKING partitions and 
> update the partition info to all consumer vertices. The procedure can be 
> illustrated as the following pseudo code:
> {code:java}
> for all Execution in ExecutionGraph:
>   for all produced IntermediateResultPartition of the Execution:
>     for all consumer ExecutionVertex of the IntermediateResultPartition:
>       update or cache partition info{code}
> This procedure has O(N^2) complexity in total.
> Based on FLINK-21326, the consumed partitions are grouped if they are 
> connected to the same consumer vertices. Therefore, we can update partition 
> info of the entire ConsumedPartitionGroup in batch, rather than one by one. 
> This will decrease the complexity from O(N^2) to O(N).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to