[ 
https://issues.apache.org/jira/browse/FLINK-32009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17720799#comment-17720799
 ] 

Weihua Hu commented on FLINK-32009:
-----------------------------------

[~baibaiwuchang] 

The "OOM" Exception was thrown by Zookeeper thread. And then task manager could 
not retrieve the job manager leader address from zookeeper so that it could not 
register to job manager in time.

I think this problem is caused by OOM exception, you can increate the memory of 
JobManager. If this issue occurs again, you need to check which component is 
occupying memory.

 

And, I would check the Zookeeper HA related code to see if this is the cause, 
if so, I think we should let JM to exit early

> Slot request bulk is not fulfillable! Could not allocate the required slot 
> within slot request timeout
> ------------------------------------------------------------------------------------------------------
>
>                 Key: FLINK-32009
>                 URL: https://issues.apache.org/jira/browse/FLINK-32009
>             Project: Flink
>          Issue Type: Bug
>          Components: Deployment / YARN
>    Affects Versions: 1.14.3
>            Reporter: hanjie
>            Priority: Major
>         Attachments: jobmanager.log, taskmanager.log
>
>
> Flink task lock,but yarn resource is full.
> {code:java}
> org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailableException: 
> Slot request bulk is not fulfillable! Could not allocate the required slot 
> within slot request timeout
> at  
> org.apache.flink.runtime.jobmaster.slotpool.PhysicalSlotRequestBulkCheckerImpl.lambda$schedulePendingRequestBulkWithTimestampCheck$0(PhysicalSlotRequestBulkCheckerImpl.java:86)
>  ~[flink-dist_2.11-1.14.3.jar:1.14.3]
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> ~[?:1.8.0_191]
> at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[?:1.8.0_191]
> at 
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.lambda$handleRunAsync$4(AkkaRpcActor.java:455)
>  ~[flink-rpc-akka_a8b5fc62-4780-45aa-879e-076def164c9f.jar:1.14.3]
> at 
> org.apache.flink.runtime.concurrent.akka.ClassLoadingUtils.runWithContextClassLoader(ClassLoadingUtils.java:68)
>  ~[flink-rpc-akka_a8b5fc62-4780-45aa-879e-076def164c9f.jar:1.14.3]
> at 
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRunAsync(AkkaRpcActor.java:455)
>  ~[flink-rpc-akka_a8b5fc62-4780-45aa-879e-076def164c9f.jar:1.14.3]
> at 
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:213)
>  ~[flink-rpc-akka_a8b5fc62-4780-45aa-879e-076def164c9f.jar:1.14.3]
> at 
> org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:78)
>  ~[flink-rpc-akka_a8b5fc62-4780-45aa-879e-076def164c9f.jar:1.14.3]
> at 
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:163)
>  ~[flink-rpc-akka_a8b5fc62-4780-45aa-879e-076def164c9f.jar:1.14.3]
> at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:24) 
> [flink-rpc-akka_a8b5fc62-4780-45aa-879e-076def164c9f.jar:1.14.3]
> at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:20) 
> [flink-rpc-akka_a8b5fc62-4780-45aa-879e-076def164c9f.jar:1.14.3]
> at scala.PartialFunction.applyOrElse(PartialFunction.scala:123) 
> [flink-rpc-akka_a8b5fc62-4780-45aa-879e-076def164c9f.jar:1.14.3]
> at scala.PartialFunction.applyOrElse$(PartialFunction.scala:122) 
> [flink-rpc-akka_a8b5fc62-4780-45aa-879e-076def164c9f.jar:1.14.3]
> at akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:20) 
> [flink-rpc-akka_a8b5fc62-4780-45aa-879e-076def164c9f.jar:1.14.3]
> at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) 
> [flink-rpc-akka_a8b5fc62-4780-45aa-879e-076def164c9f.jar:1.14.3]
> at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:172) 
> [flink-rpc-akka_a8b5fc62-4780-45aa-879e-076def164c9f.jar:1.14.3]
> at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:172) 
> [flink-rpc-akka_a8b5fc62-4780-45aa-879e-076def164c9f.jar:1.14.3]
> at akka.actor.Actor.aroundReceive(Actor.scala:537) 
> [flink-rpc-akka_a8b5fc62-4780-45aa-879e-076def164c9f.jar:1.14.3]
> at akka.actor.Actor.aroundReceive$(Actor.scala:535) 
> [flink-rpc-akka_a8b5fc62-4780-45aa-879e-076def164c9f.jar:1.14.3]
> at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:220) 
> [flink-rpc-akka_a8b5fc62-4780-45aa-879e-076def164c9f.jar:1.14.3]
> at akka.actor.ActorCell.receiveMessage(ActorCell.scala:580) 
> [flink-rpc-akka_a8b5fc62-4780-45aa-879e-076def164c9f.jar:1.14.3]
> at akka.actor.ActorCell.invoke(ActorCell.scala:548) 
> [flink-rpc-akka_a8b5fc62-4780-45aa-879e-076def164c9f.jar:1.14.3]
> at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:270) 
> [flink-rpc-akka_a8b5fc62-4780-45aa-879e-076def164c9f.jar:1.14.3]
> at akka.dispatch.Mailbox.run(Mailbox.scala:231) 
> [flink-rpc-akka_a8b5fc62-4780-45aa-879e-076def164c9f.jar:1.14.3]
> at akka.dispatch.Mailbox.exec(Mailbox.scala:243) 
> [flink-rpc-akka_a8b5fc62-4780-45aa-879e-076def164c9f.jar:1.14.3]
> at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289) 
> [?:1.8.0_191]
> at 
> java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056) 
> [?:1.8.0_191]
> at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692) 
> [?:1.8.0_191]
> at 
> java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157) 
> [?:1.8.0_191]
> Caused by: java.util.concurrent.TimeoutException: Timeout has occurred: 
> 300000 ms
> ... 29 more{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to