[jira] [Comment Edited] (FLINK-15456) Job keeps failing on slot allocation timeout due to RM not allocating new TMs for slot requests

Zhu Zhu (Jira) Thu, 02 Jan 2020 01:42:59 -0800


    [ 
https://issues.apache.org/jira/browse/FLINK-15456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17006696#comment-17006696
 ]


Zhu Zhu edited comment on FLINK-15456 at 1/2/20 9:41 AM:
---------------------------------------------------------

[~xintongsong] Yes, it looks like the case described in FLINK-13554. 
Do you have idea how can to solve it without must risk?
I will also try to repro the issue with DEBUG logs.

cc: [~trohrmann]


was (Author: zhuzh):
This issue looks like the case described in FLINK-13554. 
[~xintongsong] do you have idea how can to solve it without must risk?

cc: [~trohrmann]

> Job keeps failing on slot allocation timeout due to RM not allocating new TMs 
> for slot requests
> -----------------------------------------------------------------------------------------------
>
>                 Key: FLINK-15456
>                 URL: https://issues.apache.org/jira/browse/FLINK-15456
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / Coordination
>    Affects Versions: 1.10.0
>            Reporter: Zhu Zhu
>            Priority: Blocker
>             Fix For: 1.10.0
>
>         Attachments: jm_part.log
>
>
> As in the attached JM log, the job tried to start 30 TMs but only 29 are 
> registered. So the job fails due to not able to acquire all 30 slots needed 
> in time.
> And when the failover happens and tasks are re-scheduled, the RM will not ask 
> for new TMs even if it cannot fulfill the slot requests. So the job will keep 
> failing for slot allocation timeout.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Comment Edited] (FLINK-15456) Job keeps failing on slot allocation timeout due to RM not allocating new TMs for slot requests

Reply via email to