[
https://issues.apache.org/jira/browse/FLINK-22017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Zhu Zhu updated FLINK-22017:
----------------------------
Comment: was deleted
(was: I am the [Flink Jira Bot|https://github.com/apache/flink-jira-bot/] and I
help the community manage its development. I see this issues has been marked as
Critical but is unassigned and neither itself nor its Sub-Tasks have been
updated for 7 days. I have gone ahead and marked it "stale-critical". If this
ticket is critical, please either assign yourself or give an update.
Afterwards, please remove the label or in 7 days the issue will be
deprioritized.
)
> Regions may never be scheduled when there are cross-region blocking edges
> -------------------------------------------------------------------------
>
> Key: FLINK-22017
> URL: https://issues.apache.org/jira/browse/FLINK-22017
> Project: Flink
> Issue Type: Bug
> Components: Runtime / Coordination
> Affects Versions: 1.11.3, 1.12.2, 1.13.0
> Reporter: Zhilong Hong
> Assignee: Zhilong Hong
> Priority: Critical
> Labels: pull-request-available
> Fix For: 1.14.0
>
> Attachments: Illustration.jpg
>
>
> For the topology with cross-region blocking edges, there are regions that may
> never be scheduled. The case is illustrated in the figure below.
> !Illustration.jpg!
> Let's denote the vertices with layer_number. It's clear that the edge
> connects v2_2 and v3_2 crosses region 1 and region 2. Since region 1 has no
> blocking edges connected to other regions, it will be scheduled first. When
> vertex2_2 is finished, PipelinedRegionSchedulingStrategy will trigger
> {{onExecutionStateChange}} for it.
> As expected, region 2 will be scheduled since all its consumed partitions are
> consumable. But in fact region 2 won't be scheduled, because the result
> partition of vertex2_2 is not tagged as consumable. Whether it is consumable
> or not is determined by its IntermediateDataSet.
> However, an IntermediateDataSet is consumable if and only if all the
> producers of its IntermediateResultPartitions are finished. This
> IntermediateDataSet will never be consumable since vertex2_3 is not
> scheduled. All in all, this forms a deadlock that a region will never be
> scheduled because it's not scheduled.
> As a solution we should let BLOCKING result partitions be consumable
> individually. Note that this will result in the scheduling to become
> execution-vertex-wise instead of stage-wise, with a nice side effect towards
> better resource utilization. The PipelinedRegionSchedulingStrategy can be
> simplified along with change to get rid of the correlatedResultPartitions.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)