[ 
https://issues.apache.org/jira/browse/FLINK-21920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhilong Hong updated FLINK-21920:
---------------------------------
    Summary: Optimize ExecutionGraphToInputsLocationsRetrieverAdapter  (was: 
Optimize DefaultScheduler#allocateSlots)

> Optimize ExecutionGraphToInputsLocationsRetrieverAdapter
> --------------------------------------------------------
>
>                 Key: FLINK-21920
>                 URL: https://issues.apache.org/jira/browse/FLINK-21920
>             Project: Flink
>          Issue Type: Sub-task
>          Components: Runtime / Coordination
>    Affects Versions: 1.13.0
>            Reporter: Zhilong Hong
>            Assignee: Zhilong Hong
>            Priority: Major
>              Labels: auto-unassigned, pull-request-available, stale-assigned
>
> Based on the scheduler benchmark introduced in FLINK-21731, we find that 
> there's a procedure related to {{DefaultScheduler#allocateSlots}} that has 
> O(N^2) complexity, which is: 
> {{ExecutionGraphToInputsLocationsRetrieverAdapter#getConsumedResultPartitionsProducers}}.
> The original implementation is:
> {code:java}
> for all SchedulingExecutionVertex in DefaultScheduler:
>   for all ConsumedPartitionGroup of the SchedulingExecutionVertex:
>     for all IntermediateResultPartition in the ConsumedPartitionGroup:
>       get producer of the IntermediateResultPartition {code}
> This procedure has O(N^2) complexity.
> We can see that for each SchedulingExecutionVertex, the producers of its 
> ConsumedPartitionGroup is calculated separately. For the 
> SchedulingExecutionVertices in the same ConsumerVertexGroup, they have the 
> same ConsumedPartitionGroup. Therefore, we don't need to calculate the 
> producers over and over again. We can use a local cache to cache the 
> producers. This will decrease the complexity from O(N^2) to O(N).
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to