[ 
https://issues.apache.org/jira/browse/FLINK-22017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu closed FLINK-22017.
---------------------------
      Assignee: Zhilong Hong
    Resolution: Fixed

Fixed via 
d2005268b1eeb0fe928b69c5e56ca54862fbf508
eb8100f7afe1cd2b6fceb55b174de097db752fc7

> Regions may never be scheduled when there are cross-region blocking edges
> -------------------------------------------------------------------------
>
>                 Key: FLINK-22017
>                 URL: https://issues.apache.org/jira/browse/FLINK-22017
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / Coordination
>    Affects Versions: 1.11.3, 1.12.2, 1.13.0
>            Reporter: Zhilong Hong
>            Assignee: Zhilong Hong
>            Priority: Critical
>              Labels: pull-request-available
>             Fix For: 1.14.0
>
>         Attachments: Illustration.jpg
>
>
> For the topology with cross-region blocking edges, there are regions that may 
> never be scheduled. The case is illustrated in the figure below.
> !Illustration.jpg!
> Let's denote the vertices with layer_number. It's clear that the edge 
> connects v2_2 and v3_2 crosses region 1 and region 2. Since region 1 has no 
> blocking edges connected to other regions, it will be scheduled first. When 
> vertex2_2 is finished, PipelinedRegionSchedulingStrategy will trigger 
> {{onExecutionStateChange}} for it.
> As expected, region 2 will be scheduled since all its consumed partitions are 
> consumable. But in fact region 2 won't be scheduled, because the result 
> partition of vertex2_2 is not tagged as consumable. Whether it is consumable 
> or not is determined by its IntermediateDataSet.
> However, an IntermediateDataSet is consumable if and only if all the 
> producers of its IntermediateResultPartitions are finished. This 
> IntermediateDataSet will never be consumable since vertex2_3 is not 
> scheduled. All in all, this forms a deadlock that a region will never be 
> scheduled because it's not scheduled.
> As a solution we should let BLOCKING result partitions be consumable 
> individually. Note that this will result in the scheduling to become 
> execution-vertex-wise instead of stage-wise, with a nice side effect towards 
> better resource utilization. The PipelinedRegionSchedulingStrategy can be 
> simplified along with change to get rid of the correlatedResultPartitions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to