[
https://issues.apache.org/jira/browse/FLINK-15388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17003869#comment-17003869
]
hiliuxg edited comment on FLINK-15388 at 12/27/19 5:42 AM:
-----------------------------------------------------------
Hi [~xintongsong]:
The problem reappears, the network is ok, the CPU is not high, and there is no
full gc, but heartbeat timeout, the zk response timeout, the promethues get the
metric point timeout. Is it possible that a certain operator of a job is
blocked, such as the sink operator, causing the TM response to time out? Is it
possible that my configuration is not reasonable, 32 core cpu is configured
with 48 slots, and heap 144G.
was (Author: hiliuxg):
Hi [~xintongsong]:
The problem reappears, the network is ok, the CPU is not high, and there is no
full gc, but heartbeat timeout, the zk response timeout, the promethues get the
metric point timeout. Is it possible that a certain operator of a job is
blocked, such as the sink operator, causing the TM response to time out? Is it
possible that my configuration is not reasonable, 32 core cpu is configured
with 48 slots, and heap 144G. !metrics.png!
> The assigned slot bae00218c818157649eb9e3c533b86af_32 was removed.
> ------------------------------------------------------------------
>
> Key: FLINK-15388
> URL: https://issues.apache.org/jira/browse/FLINK-15388
> Project: Flink
> Issue Type: Bug
> Components: Runtime / Task
> Affects Versions: 1.8.0
> Environment: model : standalone,not yarn
> version : flink 1.8.0
> configration :
> jobmanager.heap.size: 4096m
> taskmanager.heap.size: 144gb
> taskmanager.numberOfTaskSlots: 48
> taskmanager.memory.fraction: 0.7
> taskmanager.memory.off-heap: false
> parallelism.default: 1
>
> Reporter: hiliuxg
> Priority: Major
> Attachments: metrics.png, metrics.png
>
>
> the taskmanager's slot was removed , there was not full gc or oom , what's
> the problem ? the error bellow
> {code:java}
> org.apache.flink.util.FlinkException: The assigned slot
> bae00218c818157649eb9e3c533b86af_32 was removed.
> at
> org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager.removeSlot(SlotManager.java:893)
> at
> org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager.removeSlots(SlotManager.java:863)
> at
> org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager.internalUnregisterTaskManager(SlotManager.java:1058)
> at
> org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager.unregisterTaskManager(SlotManager.java:385)
> at
> org.apache.flink.runtime.resourcemanager.ResourceManager.closeTaskManagerConnection(ResourceManager.java:847)
> at
> org.apache.flink.runtime.resourcemanager.ResourceManager$TaskManagerHeartbeatListener$1.run(ResourceManager.java:1161)
> at
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRunAsync(AkkaRpcActor.java:392)
> at
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:185)
> at
> org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:74)
> at
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.onReceive(AkkaRpcActor.java:147)
> at
> org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.onReceive(FencedAkkaRpcActor.java:40)
> at
> akka.actor.UntypedActor$$anonfun$receive$1.applyOrElse(UntypedActor.scala:165)
> at akka.actor.Actor$class.aroundReceive(Actor.scala:502)
> at akka.actor.UntypedActor.aroundReceive(UntypedActor.scala:95)
> at akka.actor.ActorCell.receiveMessage(ActorCell.scala:526)
> at akka.actor.ActorCell.invoke(ActorCell.scala:495)
> at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:257)
> at akka.dispatch.Mailbox.run(Mailbox.scala:224)
> at akka.dispatch.Mailbox.exec(Mailbox.scala:234)
> at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
> at
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
> at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
> at
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> {code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)