Truong Duc Kien created FLINK-9583:

             Summary: Wrong number of TaskManagers' slots after recovery.
                 Key: FLINK-9583
             Project: Flink
          Issue Type: Bug
          Components: ResourceManager
    Affects Versions: 1.5.0
         Environment: Flink 1.5.0 on YARN with the default execution mode.
            Reporter: Truong Duc Kien
         Attachments: jm.log

We started a job with 120 slots, using a FixedDelayRestart strategy with the 
delay of 1 minutes.

During recovery, some but not all Slots were released.

When the job restarts again, Flink requests a new batch of slots.

The total number of slots is now 193, larger than the configured amount, but 
the excess slots are never released.


This bug does not happen with legacy mode. I've attach the job manager log.


This message was sent by Atlassian JIRA

Reply via email to