[
https://issues.apache.org/jira/browse/FLINK-8732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Till Rohrmann closed FLINK-8732.
--------------------------------
Resolution: Fixed
Fixed via 519639c64039563ac4f2a875a8cfa630b25e4e8b
> Cancel scheduling operation when cancelling the ExecutionGraph
> --------------------------------------------------------------
>
> Key: FLINK-8732
> URL: https://issues.apache.org/jira/browse/FLINK-8732
> Project: Flink
> Issue Type: Bug
> Components: Distributed Coordination
> Affects Versions: 1.5.0
> Reporter: Till Rohrmann
> Assignee: Till Rohrmann
> Priority: Major
> Labels: flip-6
> Fix For: 1.5.0
>
>
> With the Flip-6 changes and the support for queued scheduling, the
> {{ExecutionGraph}} must be able to handle cancellation calls when it is not
> yet fully scheduled. This is for example the case when waiting for new
> containers.
> A cancellation will cancel all {{Executions}}. As a result, available slots
> can get assigned to other {{Executions}} (already canceled). Since the slot
> cannot be assigned to this slot because it's already canceled, this can fail
> the overall eager scheduling operation. The scheduling result callback will
> then trigger a global fail operation. This can happen before all
> {{Executions}} have been released and, thus, when the {{ExecutionGraph}} is
> still in the state {{CANCELLING}}. The result is that the {{ExecutionGraph}}
> goes into the state {{FAILING}} and then {{FAILED}}.
> In order to solve this problem, I propose to keep track of the scheduling
> operation and cancelling the result future when a concurrent {{suspend}},
> {{cancel}} or {{fail}} call happens.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)