[
https://issues.apache.org/jira/browse/FLINK-17330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17090753#comment-17090753
]
Till Rohrmann commented on FLINK-17330:
---------------------------------------
Thanks for reporting this issue [~zhuzh]. I think you are right that cyclic
dependencies between pipelined regions are a problem we have not considered.
Would it work to say that in the first version we don't support pipelined
regions which contain a blocking data exchange? Users would be able to work
around this problem by setting the data exchanges to blocking if they have such
a topology.
Once we have the first version of the pipelined region scheduler working we
could then address the problem of cyclic dependencies. I think we would have to
detect cyclic dependencies between pipelined regions and merge all regions
which are part of the cycle into the same pipelined region. The cyclic
dependency detection should handle the problem of intra-region all-to-all
blocking edges as well as any other kind of cyclic cross-region dependencies.
> Avoid scheduling deadlocks caused by cyclic input dependencies between regions
> ------------------------------------------------------------------------------
>
> Key: FLINK-17330
> URL: https://issues.apache.org/jira/browse/FLINK-17330
> Project: Flink
> Issue Type: Sub-task
> Components: Runtime / Coordination
> Affects Versions: 1.11.0
> Reporter: Zhu Zhu
> Priority: Major
> Fix For: 1.11.0
>
>
> Imagine a job like this:
> A -- (pipelined FORWARD) --> B -- (blocking ALL-to-ALL) --> D
> A -- (pipelined FORWARD) --> C -- (pipelined FORWARD) --> D
> parallelism=2 for all vertices.
> We will have 2 execution pipelined regions:
> R1 = {A1, B1, C1, D1}
> R2 = {A2, B2, C2, D2}
> R1 has a cross-region input edge (B2->D1).
> R2 has a cross-region input edge (B1->D2).
> Scheduling deadlock will happen since we schedule a region only when all its
> inputs are consumable (i.e. blocking partitions to be finished). This is
> because R1 can be scheduled only if R2 finishes, while R2 can be scheduled
> only if R1 finishes.
> To avoid this, one solution is to force a logical pipelined region with
> intra-region ALL-to-ALL blocking edges to form one only execution pipelined
> region, so that there would not be cyclic input dependency between regions.
> Besides that, we should also pay attention to avoid cyclic cross-region
> POINTWISE blocking edges.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)