[
https://issues.apache.org/jira/browse/FLINK-18372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17140409#comment-17140409
]
Zhu Zhu commented on FLINK-18372:
---------------------------------
Yes the cause should be that a fulfilled request is not in
`SlotPoolImpl#pendingRequests` yet. However, I think it should have been so it
is a bit weird.
I find these log lines which seem weird because a task slot request is
fulfilled even before the JM is connected to a RM.
{panel:title=My title}
2020-06-18T16:11:52.6078290Z 16:07:07,821 6321
[flink-akka.actor.default-dispatcher-5] INFO
org.apache.flink.runtime.jobmaster.JobMaster [] - Connecting to ResourceManager
akka.tcp://[email protected]:42309/user/rpc/resourcemanager_0(98b55e72529e946e2d02579002f34274)
2020-06-18T16:11:52.6079359Z 16:07:07,873 6373
[flink-akka.actor.default-dispatcher-7] INFO
org.apache.flink.runtime.jobmaster.JobMaster [] - Resolved ResourceManager
address, beginning registration
2020-06-18T16:11:52.6080396Z 16:07:07,876 6376
[flink-akka.actor.default-dispatcher-7] INFO
org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService [] -
Starting ZooKeeperLeaderRetrievalService
/leader/00a771c28c805577994e752b25bef01c/job_manager_lock.
2020-06-18T16:11:52.6081789Z 16:07:07,876 6376
[flink-akka.actor.default-dispatcher-7] INFO
org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] -
Registering job manager
[email protected]://[email protected]:42309/user/rpc/jobmanager_2
for job 00a771c28c805577994e752b25bef01c.
2020-06-18T16:11:52.6083414Z 16:07:07,907 6407
[flink-akka.actor.default-dispatcher-3] INFO
org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] -
Registered job manager
[email protected]://[email protected]:42309/user/rpc/jobmanager_2
for job 00a771c28c805577994e752b25bef01c.
2020-06-18T16:11:52.6085074Z 16:07:07,916 6416
[flink-akka.actor.default-dispatcher-5] INFO
org.apache.flink.runtime.executiongraph.ExecutionGraph [] - DataSource (at
testJobManagerFailure(JobManagerHAProcessFailureRecoveryITCase.java:165)
(org.apache.flink.api.java.io.ParallelIteratorInputFormat)) (1/4)
(d37b2d59a796947e855620e4a6b9c4a3) switched from SCHEDULED to DEPLOYING.
2020-06-18T16:11:52.6086831Z 16:07:07,917 6417
[flink-akka.actor.default-dispatcher-5] INFO
org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Deploying
DataSource (at
testJobManagerFailure(JobManagerHAProcessFailureRecoveryITCase.java:165)
(org.apache.flink.api.java.io.ParallelIteratorInputFormat)) (1/4) (attempt #0)
to b89abcc35a8876d889ddc6f6f87127ef @ 66481b3fda78 (dataPort=46239)
2020-06-18T16:11:52.6088244Z 16:07:07,953 6453
[flink-akka.actor.default-dispatcher-5] INFO
org.apache.flink.runtime.jobmaster.JobMaster [] - JobManager successfully
registered at ResourceManager, leader id: 98b55e72529e946e2d02579002f34274.
{panel}
> NullPointerException can happen in SlotPoolImpl#maybeRemapOrphanedAllocation
> ----------------------------------------------------------------------------
>
> Key: FLINK-18372
> URL: https://issues.apache.org/jira/browse/FLINK-18372
> Project: Flink
> Issue Type: Bug
> Components: Runtime / Coordination
> Affects Versions: 1.12.0
> Reporter: Zhu Zhu
> Assignee: Zhu Zhu
> Priority: Critical
> Fix For: 1.12.0
>
>
> NullPointerException can happen in SlotPoolImpl#maybeRemapOrphanedAllocation,
> which indicates a bug.
> https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_apis/build/builds/8189/logs/115
> 6:07:07,950 [flink-akka.actor.default-dispatcher-7] WARN
> org.apache.flink.runtime.taskexecutor.TaskExecutor [] - Slot
> offering to JobManager failed. Freeing the slots and returning them to the
> ResourceManager.
> java.lang.NullPointerException: null
> at
> org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl.maybeRemapOrphanedAllocation(SlotPoolImpl.java:599)
> ~[flink-runtime_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
> at
> org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl.tryFulfillSlotRequestOrMakeAvailable(SlotPoolImpl.java:564)
> ~[flink-runtime_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
> at
> org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl.offerSlot(SlotPoolImpl.java:701)
> ~[flink-runtime_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
> at
> org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl.offerSlots(SlotPoolImpl.java:625)
> ~[flink-runtime_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
> at
> org.apache.flink.runtime.jobmaster.JobMaster.offerSlots(JobMaster.java:541)
> ~[flink-runtime_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> ~[?:1.8.0_242]
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> ~[?:1.8.0_242]
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> ~[?:1.8.0_242]
> at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_242]
> at
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcInvocation(AkkaRpcActor.java:284)
> ~[flink-runtime_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
> at
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:199)
> ~[flink-runtime_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
> at
> org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:74)
> ~[flink-runtime_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
> at
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:152)
> ~[flink-runtime_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
> at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26)
> [akka-actor_2.11-2.5.21.jar:2.5.21]
> at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21)
> [akka-actor_2.11-2.5.21.jar:2.5.21]
> at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123)
> [scala-library-2.11.12.jar:?]
> at akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:21)
> [akka-actor_2.11-2.5.21.jar:2.5.21]
> at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:170)
> [scala-library-2.11.12.jar:?]
> at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171)
> [scala-library-2.11.12.jar:?]
> at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171)
> [scala-library-2.11.12.jar:?]
> at akka.actor.Actor$class.aroundReceive(Actor.scala:517)
> [akka-actor_2.11-2.5.21.jar:2.5.21]
> at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:225)
> [akka-actor_2.11-2.5.21.jar:2.5.21]
> at akka.actor.ActorCell.receiveMessage(ActorCell.scala:592)
> [akka-actor_2.11-2.5.21.jar:2.5.21]
> at akka.actor.ActorCell.invoke(ActorCell.scala:561)
> [akka-actor_2.11-2.5.21.jar:2.5.21]
> at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258)
> [akka-actor_2.11-2.5.21.jar:2.5.21]
> at akka.dispatch.Mailbox.run(Mailbox.scala:225)
> [akka-actor_2.11-2.5.21.jar:2.5.21]
> at akka.dispatch.Mailbox.exec(Mailbox.scala:235)
> [akka-actor_2.11-2.5.21.jar:2.5.21]
> at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
> [akka-actor_2.11-2.5.21.jar:2.5.21]
> at
> akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
> [akka-actor_2.11-2.5.21.jar:2.5.21]
> at
> akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
> [akka-actor_2.11-2.5.21.jar:2.5.21]
> at
> akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> [akka-actor_2.11-2.5.21.jar:2.5.21]
> 16:07:07,977 [flink-akka.actor.default-dispatcher-7] INFO
> org.apache.flink.runtime.taskexecutor.slot.TaskSlotTableImpl [] - Free slot
> TaskSlot(index:0, state:ACTIVE, resource profile:
> ResourceProfile{cpuCores=0.5000000000000000, taskHeapMemory=64.000mb
> (67108864 bytes), taskOffHeapMemory=0 bytes, managedMemory=2.000mb (2097152
> bytes), networkMemory=1.563mb (1638400 bytes)}, allocationId:
> 4dcfe78bb09fcf1117bd0be11c039df9, jobId: 00a771c28c805577994e752b25bef01c).
--
This message was sent by Atlassian Jira
(v8.3.4#803005)