[
https://issues.apache.org/jira/browse/FLINK-18372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17140570#comment-17140570
]
Zhu Zhu commented on FLINK-18372:
---------------------------------
The logs are from
https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_apis/build/builds/8189/logs/115.
Thanks for the explanation [~trohrmann]! I think you are right that this can
happen if the TM established connection with the new JM before the previous
slots timeout. That's also why this problem was revealed in this HA test.
So it's possible that we directly fulfill requests in
{{SlotPoolImpl#waitingForResourceManager}}. There would be no orphaned
allocation in this case. We can fix this issue by simply check whether the
orphaned allocation is null before remapping it.
> NullPointerException can happen in SlotPoolImpl#maybeRemapOrphanedAllocation
> ----------------------------------------------------------------------------
>
> Key: FLINK-18372
> URL: https://issues.apache.org/jira/browse/FLINK-18372
> Project: Flink
> Issue Type: Bug
> Components: Runtime / Coordination
> Affects Versions: 1.12.0
> Reporter: Zhu Zhu
> Assignee: Zhu Zhu
> Priority: Critical
> Fix For: 1.12.0
>
>
> NullPointerException can happen in SlotPoolImpl#maybeRemapOrphanedAllocation,
> which indicates a bug.
> https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_apis/build/builds/8189/logs/115
> 6:07:07,950 [flink-akka.actor.default-dispatcher-7] WARN
> org.apache.flink.runtime.taskexecutor.TaskExecutor [] - Slot
> offering to JobManager failed. Freeing the slots and returning them to the
> ResourceManager.
> java.lang.NullPointerException: null
> at
> org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl.maybeRemapOrphanedAllocation(SlotPoolImpl.java:599)
> ~[flink-runtime_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
> at
> org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl.tryFulfillSlotRequestOrMakeAvailable(SlotPoolImpl.java:564)
> ~[flink-runtime_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
> at
> org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl.offerSlot(SlotPoolImpl.java:701)
> ~[flink-runtime_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
> at
> org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl.offerSlots(SlotPoolImpl.java:625)
> ~[flink-runtime_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
> at
> org.apache.flink.runtime.jobmaster.JobMaster.offerSlots(JobMaster.java:541)
> ~[flink-runtime_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> ~[?:1.8.0_242]
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> ~[?:1.8.0_242]
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> ~[?:1.8.0_242]
> at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_242]
> at
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcInvocation(AkkaRpcActor.java:284)
> ~[flink-runtime_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
> at
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:199)
> ~[flink-runtime_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
> at
> org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:74)
> ~[flink-runtime_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
> at
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:152)
> ~[flink-runtime_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
> at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26)
> [akka-actor_2.11-2.5.21.jar:2.5.21]
> at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21)
> [akka-actor_2.11-2.5.21.jar:2.5.21]
> at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123)
> [scala-library-2.11.12.jar:?]
> at akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:21)
> [akka-actor_2.11-2.5.21.jar:2.5.21]
> at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:170)
> [scala-library-2.11.12.jar:?]
> at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171)
> [scala-library-2.11.12.jar:?]
> at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171)
> [scala-library-2.11.12.jar:?]
> at akka.actor.Actor$class.aroundReceive(Actor.scala:517)
> [akka-actor_2.11-2.5.21.jar:2.5.21]
> at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:225)
> [akka-actor_2.11-2.5.21.jar:2.5.21]
> at akka.actor.ActorCell.receiveMessage(ActorCell.scala:592)
> [akka-actor_2.11-2.5.21.jar:2.5.21]
> at akka.actor.ActorCell.invoke(ActorCell.scala:561)
> [akka-actor_2.11-2.5.21.jar:2.5.21]
> at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258)
> [akka-actor_2.11-2.5.21.jar:2.5.21]
> at akka.dispatch.Mailbox.run(Mailbox.scala:225)
> [akka-actor_2.11-2.5.21.jar:2.5.21]
> at akka.dispatch.Mailbox.exec(Mailbox.scala:235)
> [akka-actor_2.11-2.5.21.jar:2.5.21]
> at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
> [akka-actor_2.11-2.5.21.jar:2.5.21]
> at
> akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
> [akka-actor_2.11-2.5.21.jar:2.5.21]
> at
> akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
> [akka-actor_2.11-2.5.21.jar:2.5.21]
> at
> akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> [akka-actor_2.11-2.5.21.jar:2.5.21]
> 16:07:07,977 [flink-akka.actor.default-dispatcher-7] INFO
> org.apache.flink.runtime.taskexecutor.slot.TaskSlotTableImpl [] - Free slot
> TaskSlot(index:0, state:ACTIVE, resource profile:
> ResourceProfile{cpuCores=0.5000000000000000, taskHeapMemory=64.000mb
> (67108864 bytes), taskOffHeapMemory=0 bytes, managedMemory=2.000mb (2097152
> bytes), networkMemory=1.563mb (1638400 bytes)}, allocationId:
> 4dcfe78bb09fcf1117bd0be11c039df9, jobId: 00a771c28c805577994e752b25bef01c).
--
This message was sent by Atlassian Jira
(v8.3.4#803005)