[
https://issues.apache.org/jira/browse/FLINK-18012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Till Rohrmann updated FLINK-18012:
----------------------------------
Priority: Critical (was: Blocker)
> Deactivate slot timeout if TaskSlotTable.tryMarkSlotActive is called
> --------------------------------------------------------------------
>
> Key: FLINK-18012
> URL: https://issues.apache.org/jira/browse/FLINK-18012
> Project: Flink
> Issue Type: Bug
> Components: Runtime / Coordination
> Affects Versions: 1.9.3, 1.10.1, 1.11.0
> Reporter: Till Rohrmann
> Assignee: Till Rohrmann
> Priority: Critical
> Labels: pull-request-available
> Fix For: 1.11.0, 1.10.2, 1.9.4
>
>
> With FLINK-9932 we loosened the slot allocation protocol in a way that the
> {{JobMaster}} can deploy {{Tasks}} into a slot which has not been
> {{ACTIVATED}} but only {{ALLOCATED}} for a given job. This allowed to better
> handle the case where the {{JobMasterGateway#offerSlots}} response was late
> so that it timed out. The way it was solved is to offer a
> {{TaskSlotTable#tryMarkSlotActive}} method which, in contrast to
> {{TaskSlotTable#markSlotActive}}, would not fail if the requested slot was
> not available.
> However, the problem is that the former method does not deactivate the slot
> timeout. Hence, it can happen if the {{offerSlots}} response never arrives at
> the {{TaskExecutor}} that an {{ACTIVATED}} slot times out.
> In order to fix the problem, we should also deactivate the slot timeout when
> {{TaskSlotTable#tryMarkSlotActive}} is being called.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)