[
https://issues.apache.org/jira/browse/FLINK-9583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Till Rohrmann closed FLINK-9583.
--------------------------------
Resolution: Duplicate
Fix Version/s: (was: 1.6.0)
The issue seems to be a duplicate on FLINK-9635. There is a temporary fix which
disables the local recovery scheduling logic if local recovery is disabled (see
FLINK-9634). The fix is included in Flink 1.5.1 and 1.6.0. So you should be
save if you don't activate local recovery.
> Wrong number of TaskManagers' slots after recovery.
> ---------------------------------------------------
>
> Key: FLINK-9583
> URL: https://issues.apache.org/jira/browse/FLINK-9583
> Project: Flink
> Issue Type: Bug
> Components: ResourceManager
> Affects Versions: 1.5.0
> Environment: Flink 1.5.0 on YARN with the default execution mode.
> Reporter: Truong Duc Kien
> Priority: Major
> Attachments: jm.log
>
>
> We started a job with 120 slots, using a FixedDelayRestart strategy with the
> delay of 1 minutes.
> During recovery, some but not all Slots were released.
> When the job restarts again, Flink requests a new batch of slots.
> The total number of slots is now 193, larger than the configured amount, but
> the excess slots are never released.
>
> This bug does not happen with legacy mode. I've attach the job manager log.
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)