[
https://issues.apache.org/jira/browse/FLINK-22017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Zhilong Hong updated FLINK-22017:
---------------------------------
Fix Version/s: 1.14.0
> Regions may never be scheduled when there are cross-region blocking edges
> -------------------------------------------------------------------------
>
> Key: FLINK-22017
> URL: https://issues.apache.org/jira/browse/FLINK-22017
> Project: Flink
> Issue Type: Bug
> Components: Runtime / Coordination
> Affects Versions: 1.11.3, 1.12.2, 1.13.0
> Reporter: Zhilong Hong
> Priority: Critical
> Labels: pull-request-available
> Fix For: 1.14.0
>
> Attachments: Illustration.jpg
>
>
> For the topology with cross-region blocking edges, there are regions that may
> never be scheduled. The case is illustrated in the figure below.
> !Illustration.jpg!
> Let's denote the vertices with layer_number. It's clear that the edge
> connects v2_2 and v3_2 crosses region 1 and region 2. Since region 1 has no
> blocking edges connected to other regions, it will be scheduled first. When
> vertex2_2 is finished, PipelinedRegionSchedulingStrategy will trigger
> {{onExecutionStateChange}} for it.
> As expected, region 2 will be scheduled since all its consumed partitions are
> consumable. But in fact region 2 won't be scheduled, because the result
> partition of vertex2_2 is not tagged as consumable. Whether it is consumable
> or not is determined by its IntermediateDataSet.
> However, an IntermediateDataSet is consumable if and only if all the
> producers of its IntermediateResultPartitions are finished. This
> IntermediateDataSet will never be consumable since vertex2_3 is not
> scheduled. All in all, this forms a deadlock that a region will never be
> scheduled because it's not scheduled.
> As a solution we should let BLOCKING result partitions be consumable
> individually. Note that this will result in the scheduling to become
> execution-vertex-wise instead of stage-wise, with a nice side effect towards
> better resource utilization. The PipelinedRegionSchedulingStrategy can be
> simplified along with change to get rid of the correlatedResultPartitions.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)