[
https://issues.apache.org/jira/browse/FLINK-21597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matthias closed FLINK-21597.
----------------------------
Resolution: Won't Fix
I looked at the logs once more but it's hard to get anything out of it due to
the missing debug logs. The root cause seems to be an issue in the allocation
of the physical slot on the TaskManager's side.
We ruled out the race condition issue addressed in FLINK-21751. The timeout
exception should appear earlier (10s instead of 5mins).
We decided to close this issue for now as nobody was able to reproduce the
failure and due to the lack of available debug logs. We should reiterate over
it once more if this error appears again.
> testMapAfterRepartitionHasCorrectParallelism2 Fail because of
> "NoResourceAvailableException"
> ---------------------------------------------------------------------------------------------
>
> Key: FLINK-21597
> URL: https://issues.apache.org/jira/browse/FLINK-21597
> Project: Flink
> Issue Type: Bug
> Components: Runtime / Coordination
> Affects Versions: 1.13.0
> Reporter: Guowei Ma
> Assignee: Matthias
> Priority: Major
> Labels: test-stability
> Attachments: FLINK-21597.log
>
>
> {code:java}
> 2021-03-04T00:17:41.2017402Z [ERROR]
> testMapAfterRepartitionHasCorrectParallelism2[Execution mode =
> CLUSTER](org.apache.flink.api.scala.operators.PartitionITCase) Time elapsed:
> 300.117 s <<< ERROR!
> 2021-03-04T00:17:41.2018058Z
> org.apache.flink.runtime.client.JobExecutionException: Job execution failed.
> 2021-03-04T00:17:41.2018525Z at
> org.apache.flink.runtime.jobmaster.JobResult.toJobExecutionResult(JobResult.java:144)
> 2021-03-04T00:17:41.2019563Z at
> org.apache.flink.runtime.minicluster.MiniClusterJobClient.lambda$getJobExecutionResult$3(MiniClusterJobClient.java:137)
> 2021-03-04T00:17:41.2020129Z at
> java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:616)
> 2021-03-04T00:17:41.2021974Z at
> java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:591)
> 2021-03-04T00:17:41.2022634Z at
> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488)
> 2021-03-04T00:17:41.2023118Z at
> java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1975)
> 2021-03-04T00:17:41.2023682Z at
> org.apache.flink.runtime.rpc.akka.AkkaInvocationHandler.lambda$invokeRpc$0(AkkaInvocationHandler.java:237)
> 2021-03-04T00:17:41.2024244Z at
> java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774)
> 2021-03-04T00:17:41.2024749Z at
> java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750)
> 2021-03-04T00:17:41.2025261Z at
> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488)
> 2021-03-04T00:17:41.2026070Z at
> java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1975)
> 2021-03-04T00:17:41.2026814Z at
> org.apache.flink.runtime.concurrent.FutureUtils$1.onComplete(FutureUtils.java:1066)
> 2021-03-04T00:17:41.2027633Z at
> akka.dispatch.OnComplete.internal(Future.scala:264)
> 2021-03-04T00:17:41.2028245Z at
> akka.dispatch.OnComplete.internal(Future.scala:261)
> 2021-03-04T00:17:41.2028796Z at
> akka.dispatch.japi$CallbackBridge.apply(Future.scala:191)
> 2021-03-04T00:17:41.2029327Z at
> akka.dispatch.japi$CallbackBridge.apply(Future.scala:188)
> 2021-03-04T00:17:41.2030017Z at
> scala.concurrent.impl.CallbackRunnable.run(Promise.scala:36)
> 2021-03-04T00:17:41.2030795Z at
> org.apache.flink.runtime.concurrent.Executors$DirectExecutionContext.execute(Executors.java:73)
> 2021-03-04T00:17:41.2031885Z at
> scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:44)
> 2021-03-04T00:17:41.2032678Z at
> scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:252)
> 2021-03-04T00:17:41.2033428Z at
> akka.pattern.PromiseActorRef.$bang(AskSupport.scala:572)
> 2021-03-04T00:17:41.2034197Z at
> akka.pattern.PipeToSupport$PipeableFuture$$anonfun$pipeTo$1.applyOrElse(PipeToSupport.scala:22)
> 2021-03-04T00:17:41.2035094Z at
> akka.pattern.PipeToSupport$PipeableFuture$$anonfun$pipeTo$1.applyOrElse(PipeToSupport.scala:21)
> 2021-03-04T00:17:41.2035915Z at
> scala.concurrent.Future$$anonfun$andThen$1.apply(Future.scala:436)
> 2021-03-04T00:17:41.2036617Z at
> scala.concurrent.Future$$anonfun$andThen$1.apply(Future.scala:435)
> 2021-03-04T00:17:41.2037537Z at
> scala.concurrent.impl.CallbackRunnable.run(Promise.scala:36)
> 2021-03-04T00:17:41.2038019Z at
> akka.dispatch.BatchingExecutor$AbstractBatch.processBatch(BatchingExecutor.scala:55)
> 2021-03-04T00:17:41.2038554Z at
> akka.dispatch.BatchingExecutor$BlockableBatch$$anonfun$run$1.apply$mcV$sp(BatchingExecutor.scala:91)
> 2021-03-04T00:17:41.2039117Z at
> akka.dispatch.BatchingExecutor$BlockableBatch$$anonfun$run$1.apply(BatchingExecutor.scala:91)
> 2021-03-04T00:17:41.2039671Z at
> akka.dispatch.BatchingExecutor$BlockableBatch$$anonfun$run$1.apply(BatchingExecutor.scala:91)
> 2021-03-04T00:17:41.2040159Z at
> scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:72)
> 2021-03-04T00:17:41.2040632Z at
> akka.dispatch.BatchingExecutor$BlockableBatch.run(BatchingExecutor.scala:90)
> 2021-03-04T00:17:41.2041086Z at
> akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40)
> 2021-03-04T00:17:41.2041810Z at
> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(ForkJoinExecutorConfigurator.scala:44)
> 2021-03-04T00:17:41.2042514Z at
> akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
> 2021-03-04T00:17:41.2042977Z at
> akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
> 2021-03-04T00:17:41.2043425Z at
> akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
> 2021-03-04T00:17:41.2043887Z at
> akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> 2021-03-04T00:17:41.2044399Z Caused by:
> org.apache.flink.runtime.JobException: Recovery is suppressed by
> NoRestartBackoffTimeStrategy
> 2021-03-04T00:17:41.2044991Z at
> org.apache.flink.runtime.executiongraph.failover.flip1.ExecutionFailureHandler.handleFailure(ExecutionFailureHandler.java:130)
> 2021-03-04T00:17:41.2045695Z at
> org.apache.flink.runtime.executiongraph.failover.flip1.ExecutionFailureHandler.getFailureHandlingResult(ExecutionFailureHandler.java:81)
> 2021-03-04T00:17:41.2046343Z at
> org.apache.flink.runtime.scheduler.DefaultScheduler.handleTaskFailure(DefaultScheduler.java:221)
> 2021-03-04T00:17:41.2047000Z at
> org.apache.flink.runtime.scheduler.DefaultScheduler.maybeHandleTaskFailure(DefaultScheduler.java:212)
> 2021-03-04T00:17:41.2047579Z at
> org.apache.flink.runtime.scheduler.DefaultScheduler.updateTaskExecutionStateInternal(DefaultScheduler.java:203)
> 2021-03-04T00:17:41.2048171Z at
> org.apache.flink.runtime.scheduler.SchedulerBase.updateTaskExecutionState(SchedulerBase.java:696)
> 2021-03-04T00:17:41.2049092Z at
> org.apache.flink.runtime.scheduler.UpdateSchedulerNgOnInternalFailuresListener.notifyTaskFailure(UpdateSchedulerNgOnInternalFailuresListener.java:51)
> 2021-03-04T00:17:41.2049893Z at
> org.apache.flink.runtime.executiongraph.ExecutionGraph.notifySchedulerNgAboutInternalTaskFailure(ExecutionGraph.java:1470)
> 2021-03-04T00:17:41.2050492Z at
> org.apache.flink.runtime.executiongraph.Execution.processFail(Execution.java:1111)
> 2021-03-04T00:17:41.2050989Z at
> org.apache.flink.runtime.executiongraph.Execution.processFail(Execution.java:1051)
> 2021-03-04T00:17:41.2051474Z at
> org.apache.flink.runtime.executiongraph.Execution.markFailed(Execution.java:885)
> 2021-03-04T00:17:41.2052211Z at
> org.apache.flink.runtime.executiongraph.ExecutionVertex.markFailed(ExecutionVertex.java:661)
> 2021-03-04T00:17:41.2052877Z at
> org.apache.flink.runtime.scheduler.DefaultExecutionVertexOperations.markFailed(DefaultExecutionVertexOperations.java:41)
> 2021-03-04T00:17:41.2053654Z at
> org.apache.flink.runtime.scheduler.DefaultScheduler.handleTaskDeploymentFailure(DefaultScheduler.java:505)
> 2021-03-04T00:17:41.2054285Z at
> org.apache.flink.runtime.scheduler.DefaultScheduler.lambda$assignResourceOrHandleError$6(DefaultScheduler.java:490)
> 2021-03-04T00:17:41.2054838Z at
> java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:836)
> 2021-03-04T00:17:41.2055323Z at
> java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:811)
> 2021-03-04T00:17:41.2055805Z at
> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488)
> 2021-03-04T00:17:41.2056318Z at
> java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1990)
> 2021-03-04T00:17:41.2056943Z at
> org.apache.flink.runtime.scheduler.SharedSlot.cancelLogicalSlotRequest(SharedSlot.java:222)
> 2021-03-04T00:17:41.2057554Z at
> org.apache.flink.runtime.scheduler.SlotSharingExecutionSlotAllocator.cancelLogicalSlotRequest(SlotSharingExecutionSlotAllocator.java:164)
> 2021-03-04T00:17:41.2058220Z at
> org.apache.flink.runtime.scheduler.SharingPhysicalSlotRequestBulk.cancel(SharingPhysicalSlotRequestBulk.java:86)
> 2021-03-04T00:17:41.2058875Z at
> org.apache.flink.runtime.jobmaster.slotpool.PhysicalSlotRequestBulkWithTimestamp.cancel(PhysicalSlotRequestBulkWithTimestamp.java:66)
> 2021-03-04T00:17:41.2059642Z at
> org.apache.flink.runtime.jobmaster.slotpool.PhysicalSlotRequestBulkCheckerImpl.lambda$schedulePendingRequestBulkWithTimestampCheck$0(PhysicalSlotRequestBulkCheckerImpl.java:91)
> 2021-03-04T00:17:41.2060319Z at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> 2021-03-04T00:17:41.2060938Z at
> java.util.concurrent.FutureTask.run(FutureTask.java:266)
> 2021-03-04T00:17:41.2061472Z at
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRunAsync(AkkaRpcActor.java:440)
> 2021-03-04T00:17:41.2062265Z at
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:208)
> 2021-03-04T00:17:41.2062824Z at
> org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:77)
> 2021-03-04T00:17:41.2063375Z at
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:158)
> 2021-03-04T00:17:41.2063821Z at
> akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26)
> 2021-03-04T00:17:41.2064246Z at
> akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21)
> 2021-03-04T00:17:41.2064669Z at
> scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123)
> 2021-03-04T00:17:41.2065093Z at
> akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:21)
> 2021-03-04T00:17:41.2065537Z at
> scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:170)
> 2021-03-04T00:17:41.2065975Z at
> scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171)
> 2021-03-04T00:17:41.2066390Z at
> scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171)
> 2021-03-04T00:17:41.2066798Z at
> akka.actor.Actor$class.aroundReceive(Actor.scala:517)
> 2021-03-04T00:17:41.2067249Z at
> akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:225)
> 2021-03-04T00:17:41.2067916Z at
> akka.actor.ActorCell.receiveMessage(ActorCell.scala:592)
> 2021-03-04T00:17:41.2068415Z at
> akka.actor.ActorCell.invoke(ActorCell.scala:561)
> 2021-03-04T00:17:41.2068785Z at
> akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258)
> 2021-03-04T00:17:41.2069166Z at akka.dispatch.Mailbox.run(Mailbox.scala:225)
> 2021-03-04T00:17:41.2069523Z at akka.dispatch.Mailbox.exec(Mailbox.scala:235)
> 2021-03-04T00:17:41.2069784Z ... 4 more
> 2021-03-04T00:17:41.2070383Z Caused by:
> java.util.concurrent.CompletionException:
> org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailableException:
> Slot request bulk is not fulfillable! Could not allocate the required slot
> within slot request timeout
> 2021-03-04T00:17:41.2071162Z at
> java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292)
> 2021-03-04T00:17:41.2071905Z at
> java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308)
> 2021-03-04T00:17:41.2072420Z at
> java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:607)
> 2021-03-04T00:17:41.2073089Z at
> java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:591)
> 2021-03-04T00:17:41.2073462Z ... 31 more
> 2021-03-04T00:17:41.2073977Z Caused by:
> org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailableException:
> Slot request bulk is not fulfillable! Could not allocate the required slot
> within slot request timeout
> 2021-03-04T00:17:41.2074838Z at
> org.apache.flink.runtime.jobmaster.slotpool.PhysicalSlotRequestBulkCheckerImpl.lambda$schedulePendingRequestBulkWithTimestampCheck$0(PhysicalSlotRequestBulkCheckerImpl.java:86)
> 2021-03-04T00:17:41.2075386Z ... 24 more
> 2021-03-04T00:17:41.2075706Z Caused by:
> java.util.concurrent.TimeoutException: Timeout has occurred: 300000 ms
> 2021-03-04T00:17:41.2076048Z ... 25 more
> {code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)