[
https://issues.apache.org/jira/browse/FLINK-21915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Zhilong Hong updated FLINK-21915:
---------------------------------
Labels: pull-request-available (was: auto-unassigned
pull-request-available)
> Optimize Execution#finishPartitionsAndUpdateConsumers
> -----------------------------------------------------
>
> Key: FLINK-21915
> URL: https://issues.apache.org/jira/browse/FLINK-21915
> Project: Flink
> Issue Type: Sub-task
> Components: Runtime / Coordination
> Affects Versions: 1.13.0
> Reporter: Zhilong Hong
> Priority: Major
> Labels: pull-request-available
>
> Based on the scheduler benchmark {{PartitionReleaseInBatchJobBenchmark}}
> introduced in FLINK-20612, we find that there's another procedure that has
> O(N^2) computation complexity:
> {{Execution#finishPartitionsAndUpdateConsumers}}.
> Once an execution is finished, it will finish all its BLOCKING partitions and
> update the partition info to all consumer vertices. The procedure can be
> illustrated as the following pseudo code:
> {code:java}
> for all Execution in ExecutionGraph:
> for all produced IntermediateResultPartition of the Execution:
> for all consumer ExecutionVertex of the IntermediateResultPartition:
> update or cache partition info{code}
> This procedure has O(N^2) complexity in total.
> Based on FLINK-21326, the consumed partitions are grouped if they are
> connected to the same consumer vertices. Therefore, we can update partition
> info of the entire ConsumedPartitionGroup in batch, rather than one by one.
> This will decrease the complexity from O(N^2) to O(N).
--
This message was sent by Atlassian Jira
(v8.3.4#803005)