Zhu Zhu created FLINK-15456:
-------------------------------
Summary: Job keeps failing on slot allocation timeout due to RM
not allocating new TMs for slot requests
Key: FLINK-15456
URL: https://issues.apache.org/jira/browse/FLINK-15456
Project: Flink
Issue Type: Bug
Components: Runtime / Coordination
Affects Versions: 1.10.0
Reporter: Zhu Zhu
Fix For: 1.10.0
Attachments: jm_part.log
As in the attached JM log, the job tried to start 30 TMs but only 29 are
registered. So the job fails due to not able to acquire all 30 slots needed in
time.
And when the failover happens and tasks are re-scheduled, the RM will not ask
for new TMs even if it cannot fulfill the slot requests. So the job will keep
failing for slot allocation timeout.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)