[ 
https://issues.apache.org/jira/browse/FLINK-22017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Flink Jira Bot updated FLINK-22017:
-----------------------------------
    Labels: auto-deprioritized-critical stale-major  (was: 
auto-deprioritized-critical)

I am the [Flink Jira Bot|https://github.com/apache/flink-jira-bot/] and I help 
the community manage its development. I see this issues has been marked as 
Major but is unassigned and neither itself nor its Sub-Tasks have been updated 
for 30 days. I have gone ahead and added a "stale-major" to the issue". If this 
ticket is a Major, please either assign yourself or give an update. Afterwards, 
please remove the label or in 7 days the issue will be deprioritized.


> Regions may never be scheduled when there are cross-region blocking edges
> -------------------------------------------------------------------------
>
>                 Key: FLINK-22017
>                 URL: https://issues.apache.org/jira/browse/FLINK-22017
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / Coordination
>    Affects Versions: 1.11.3, 1.12.2, 1.13.0
>            Reporter: Zhilong Hong
>            Priority: Major
>              Labels: auto-deprioritized-critical, stale-major
>         Attachments: Illustration.jpg
>
>
> For the topology with cross-region blocking edges, there are regions that may 
> never be scheduled. The case is illustrated in the figure below.
> !Illustration.jpg!
> Let's denote the vertices with layer_number. It's clear that the edge 
> connects v2_2 and v3_2 crosses region 1 and region 2. Since region 1 has no 
> blocking edges connected to other regions, it will be scheduled first. When 
> vertex2_2 is finished, PipelinedRegionSchedulingStrategy will trigger 
> {{onExecutionStateChange}} for it.
> As expected, region 2 will be scheduled since all its consumed partitions are 
> consumable. But in fact region 2 won't be scheduled, because the result 
> partition of vertex2_2 is not tagged as consumable. Whether it is consumable 
> or not is determined by its IntermediateDataSet.
> However, an IntermediateDataSet is consumable if and only if all the 
> producers of its IntermediateResultPartitions are finished. This 
> IntermediateDataSet will never be consumable since vertex2_3 is not 
> scheduled. All in all, this forms a deadlock that a region will never be 
> scheduled because it's not scheduled.
> As a solution we should let BLOCKING result partitions be consumable 
> individually. Note that this will result in the scheduling to become 
> execution-vertex-wise instead of stage-wise, with a nice side effect towards 
> better resource utilization. The PipelinedRegionSchedulingStrategy can be 
> simplified along with change to get rid of the correlatedResultPartitions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to