[
https://issues.apache.org/jira/browse/FLINK-6325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
zhijiang closed FLINK-6325.
---------------------------
Resolution: Duplicate
> Refinement of slot reuse for task manager failure
> -------------------------------------------------
>
> Key: FLINK-6325
> URL: https://issues.apache.org/jira/browse/FLINK-6325
> Project: Flink
> Issue Type: Improvement
> Components: JobManager
> Reporter: zhijiang
> Assignee: zhijiang
> Priority: Minor
>
> After task or TaskManager failure, the new execution attempt tries to take
> the slot from prior execution by default. It can get benefits for state
> recovery locality by RocksDB backend, and it actually makes sense for task
> failure scenario.
> But for TaskManager failure scenario, the inside slot is recycled and can not
> be reused any more. When the inside execution resets to allocate slot from
> {{SlotPool}}, no slot can be matched by {{ResourceID}}, then it will try to
> match any other available slots by {{ResourceProfile}}. As a result, the
> other parallel execution's slot will be occupied by this execution in failed
> {{TaskManager}}, and all the following executions may not reuse the previous
> slots any more. It will bring bad effects for state recovery.
> To solve this problem, we would like to request a new slot for re-deployment
> when attached with an unavailable location, so it will not occupy the other
> alive slots any more.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)