[
https://issues.apache.org/jira/browse/FLINK-20138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17232006#comment-17232006
]
Till Rohrmann commented on FLINK-20138:
---------------------------------------
W/o the debug logs it is hard to tell whats going wrong in your case.
> Flink Job can not recover due to timeout of requiring slots when flink
> jobmanager restarted
> --------------------------------------------------------------------------------------------
>
> Key: FLINK-20138
> URL: https://issues.apache.org/jira/browse/FLINK-20138
> Project: Flink
> Issue Type: Bug
> Components: Deployment / YARN, Table SQL / Runtime
> Environment: flink : 1.9.2
> hadoop :2.7.2
> jdk:1.8
> Reporter: wgcn
> Priority: Major
> Attachments: 2820F7EE-85F9-441D-95D5-8163FB6267DF.png,
> jobmanager.log, zk_resource_address_info.png
>
>
> our flink jobs run on Yarn Perjob Mode. We stoped some nodemanger machines
> ,and AMs of the machines restarted at other nodemanager. We found some
> jobs can not recover due to timeout of requiring slots.
> *SlotPoolImp always did not connect ResourceManager *
> ```
> 2020-11-09 16:31:31,794 INFO
> flink-akka.actor.default-dispatcher-16
> (org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl.stashRequestWaitingForResourceManager:369)
> - Cannot serve slot request, no ResourceManager connected. Adding as pending
> request [SlotRequestId{456c9daa6670a4490810f8e51f495174}]
> ```
> *1.We did not find the log of YarnResourceManager requesting container at
> the jobmanager log of attachment.
> 2.The node of Zookeeper is also showed at attachment .*
--
This message was sent by Atlassian Jira
(v8.3.4#803005)