[ 
https://issues.apache.org/jira/browse/FLINK-21597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias closed FLINK-21597.
----------------------------
    Resolution: Won't Fix

I looked at the logs once more but it's hard to get anything out of it due to 
the missing debug logs. The root cause seems to be an issue in the allocation 
of the physical slot on the TaskManager's side.

We ruled out the race condition issue addressed in FLINK-21751. The timeout 
exception should appear earlier (10s instead of 5mins).

We decided to close this issue for now as nobody was able to reproduce the 
failure and due to the lack of available debug logs. We should reiterate over 
it once more if this error appears again.

> testMapAfterRepartitionHasCorrectParallelism2 Fail because of 
> "NoResourceAvailableException" 
> ---------------------------------------------------------------------------------------------
>
>                 Key: FLINK-21597
>                 URL: https://issues.apache.org/jira/browse/FLINK-21597
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / Coordination
>    Affects Versions: 1.13.0
>            Reporter: Guowei Ma
>            Assignee: Matthias
>            Priority: Major
>              Labels: test-stability
>         Attachments: FLINK-21597.log
>
>
> {code:java}
> 2021-03-04T00:17:41.2017402Z [ERROR] 
> testMapAfterRepartitionHasCorrectParallelism2[Execution mode = 
> CLUSTER](org.apache.flink.api.scala.operators.PartitionITCase)  Time elapsed: 
> 300.117 s  <<< ERROR!
> 2021-03-04T00:17:41.2018058Z 
> org.apache.flink.runtime.client.JobExecutionException: Job execution failed.
> 2021-03-04T00:17:41.2018525Z  at 
> org.apache.flink.runtime.jobmaster.JobResult.toJobExecutionResult(JobResult.java:144)
> 2021-03-04T00:17:41.2019563Z  at 
> org.apache.flink.runtime.minicluster.MiniClusterJobClient.lambda$getJobExecutionResult$3(MiniClusterJobClient.java:137)
> 2021-03-04T00:17:41.2020129Z  at 
> java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:616)
> 2021-03-04T00:17:41.2021974Z  at 
> java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:591)
> 2021-03-04T00:17:41.2022634Z  at 
> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488)
> 2021-03-04T00:17:41.2023118Z  at 
> java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1975)
> 2021-03-04T00:17:41.2023682Z  at 
> org.apache.flink.runtime.rpc.akka.AkkaInvocationHandler.lambda$invokeRpc$0(AkkaInvocationHandler.java:237)
> 2021-03-04T00:17:41.2024244Z  at 
> java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774)
> 2021-03-04T00:17:41.2024749Z  at 
> java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750)
> 2021-03-04T00:17:41.2025261Z  at 
> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488)
> 2021-03-04T00:17:41.2026070Z  at 
> java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1975)
> 2021-03-04T00:17:41.2026814Z  at 
> org.apache.flink.runtime.concurrent.FutureUtils$1.onComplete(FutureUtils.java:1066)
> 2021-03-04T00:17:41.2027633Z  at 
> akka.dispatch.OnComplete.internal(Future.scala:264)
> 2021-03-04T00:17:41.2028245Z  at 
> akka.dispatch.OnComplete.internal(Future.scala:261)
> 2021-03-04T00:17:41.2028796Z  at 
> akka.dispatch.japi$CallbackBridge.apply(Future.scala:191)
> 2021-03-04T00:17:41.2029327Z  at 
> akka.dispatch.japi$CallbackBridge.apply(Future.scala:188)
> 2021-03-04T00:17:41.2030017Z  at 
> scala.concurrent.impl.CallbackRunnable.run(Promise.scala:36)
> 2021-03-04T00:17:41.2030795Z  at 
> org.apache.flink.runtime.concurrent.Executors$DirectExecutionContext.execute(Executors.java:73)
> 2021-03-04T00:17:41.2031885Z  at 
> scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:44)
> 2021-03-04T00:17:41.2032678Z  at 
> scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:252)
> 2021-03-04T00:17:41.2033428Z  at 
> akka.pattern.PromiseActorRef.$bang(AskSupport.scala:572)
> 2021-03-04T00:17:41.2034197Z  at 
> akka.pattern.PipeToSupport$PipeableFuture$$anonfun$pipeTo$1.applyOrElse(PipeToSupport.scala:22)
> 2021-03-04T00:17:41.2035094Z  at 
> akka.pattern.PipeToSupport$PipeableFuture$$anonfun$pipeTo$1.applyOrElse(PipeToSupport.scala:21)
> 2021-03-04T00:17:41.2035915Z  at 
> scala.concurrent.Future$$anonfun$andThen$1.apply(Future.scala:436)
> 2021-03-04T00:17:41.2036617Z  at 
> scala.concurrent.Future$$anonfun$andThen$1.apply(Future.scala:435)
> 2021-03-04T00:17:41.2037537Z  at 
> scala.concurrent.impl.CallbackRunnable.run(Promise.scala:36)
> 2021-03-04T00:17:41.2038019Z  at 
> akka.dispatch.BatchingExecutor$AbstractBatch.processBatch(BatchingExecutor.scala:55)
> 2021-03-04T00:17:41.2038554Z  at 
> akka.dispatch.BatchingExecutor$BlockableBatch$$anonfun$run$1.apply$mcV$sp(BatchingExecutor.scala:91)
> 2021-03-04T00:17:41.2039117Z  at 
> akka.dispatch.BatchingExecutor$BlockableBatch$$anonfun$run$1.apply(BatchingExecutor.scala:91)
> 2021-03-04T00:17:41.2039671Z  at 
> akka.dispatch.BatchingExecutor$BlockableBatch$$anonfun$run$1.apply(BatchingExecutor.scala:91)
> 2021-03-04T00:17:41.2040159Z  at 
> scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:72)
> 2021-03-04T00:17:41.2040632Z  at 
> akka.dispatch.BatchingExecutor$BlockableBatch.run(BatchingExecutor.scala:90)
> 2021-03-04T00:17:41.2041086Z  at 
> akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40)
> 2021-03-04T00:17:41.2041810Z  at 
> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(ForkJoinExecutorConfigurator.scala:44)
> 2021-03-04T00:17:41.2042514Z  at 
> akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
> 2021-03-04T00:17:41.2042977Z  at 
> akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
> 2021-03-04T00:17:41.2043425Z  at 
> akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
> 2021-03-04T00:17:41.2043887Z  at 
> akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> 2021-03-04T00:17:41.2044399Z Caused by: 
> org.apache.flink.runtime.JobException: Recovery is suppressed by 
> NoRestartBackoffTimeStrategy
> 2021-03-04T00:17:41.2044991Z  at 
> org.apache.flink.runtime.executiongraph.failover.flip1.ExecutionFailureHandler.handleFailure(ExecutionFailureHandler.java:130)
> 2021-03-04T00:17:41.2045695Z  at 
> org.apache.flink.runtime.executiongraph.failover.flip1.ExecutionFailureHandler.getFailureHandlingResult(ExecutionFailureHandler.java:81)
> 2021-03-04T00:17:41.2046343Z  at 
> org.apache.flink.runtime.scheduler.DefaultScheduler.handleTaskFailure(DefaultScheduler.java:221)
> 2021-03-04T00:17:41.2047000Z  at 
> org.apache.flink.runtime.scheduler.DefaultScheduler.maybeHandleTaskFailure(DefaultScheduler.java:212)
> 2021-03-04T00:17:41.2047579Z  at 
> org.apache.flink.runtime.scheduler.DefaultScheduler.updateTaskExecutionStateInternal(DefaultScheduler.java:203)
> 2021-03-04T00:17:41.2048171Z  at 
> org.apache.flink.runtime.scheduler.SchedulerBase.updateTaskExecutionState(SchedulerBase.java:696)
> 2021-03-04T00:17:41.2049092Z  at 
> org.apache.flink.runtime.scheduler.UpdateSchedulerNgOnInternalFailuresListener.notifyTaskFailure(UpdateSchedulerNgOnInternalFailuresListener.java:51)
> 2021-03-04T00:17:41.2049893Z  at 
> org.apache.flink.runtime.executiongraph.ExecutionGraph.notifySchedulerNgAboutInternalTaskFailure(ExecutionGraph.java:1470)
> 2021-03-04T00:17:41.2050492Z  at 
> org.apache.flink.runtime.executiongraph.Execution.processFail(Execution.java:1111)
> 2021-03-04T00:17:41.2050989Z  at 
> org.apache.flink.runtime.executiongraph.Execution.processFail(Execution.java:1051)
> 2021-03-04T00:17:41.2051474Z  at 
> org.apache.flink.runtime.executiongraph.Execution.markFailed(Execution.java:885)
> 2021-03-04T00:17:41.2052211Z  at 
> org.apache.flink.runtime.executiongraph.ExecutionVertex.markFailed(ExecutionVertex.java:661)
> 2021-03-04T00:17:41.2052877Z  at 
> org.apache.flink.runtime.scheduler.DefaultExecutionVertexOperations.markFailed(DefaultExecutionVertexOperations.java:41)
> 2021-03-04T00:17:41.2053654Z  at 
> org.apache.flink.runtime.scheduler.DefaultScheduler.handleTaskDeploymentFailure(DefaultScheduler.java:505)
> 2021-03-04T00:17:41.2054285Z  at 
> org.apache.flink.runtime.scheduler.DefaultScheduler.lambda$assignResourceOrHandleError$6(DefaultScheduler.java:490)
> 2021-03-04T00:17:41.2054838Z  at 
> java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:836)
> 2021-03-04T00:17:41.2055323Z  at 
> java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:811)
> 2021-03-04T00:17:41.2055805Z  at 
> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488)
> 2021-03-04T00:17:41.2056318Z  at 
> java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1990)
> 2021-03-04T00:17:41.2056943Z  at 
> org.apache.flink.runtime.scheduler.SharedSlot.cancelLogicalSlotRequest(SharedSlot.java:222)
> 2021-03-04T00:17:41.2057554Z  at 
> org.apache.flink.runtime.scheduler.SlotSharingExecutionSlotAllocator.cancelLogicalSlotRequest(SlotSharingExecutionSlotAllocator.java:164)
> 2021-03-04T00:17:41.2058220Z  at 
> org.apache.flink.runtime.scheduler.SharingPhysicalSlotRequestBulk.cancel(SharingPhysicalSlotRequestBulk.java:86)
> 2021-03-04T00:17:41.2058875Z  at 
> org.apache.flink.runtime.jobmaster.slotpool.PhysicalSlotRequestBulkWithTimestamp.cancel(PhysicalSlotRequestBulkWithTimestamp.java:66)
> 2021-03-04T00:17:41.2059642Z  at 
> org.apache.flink.runtime.jobmaster.slotpool.PhysicalSlotRequestBulkCheckerImpl.lambda$schedulePendingRequestBulkWithTimestampCheck$0(PhysicalSlotRequestBulkCheckerImpl.java:91)
> 2021-03-04T00:17:41.2060319Z  at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> 2021-03-04T00:17:41.2060938Z  at 
> java.util.concurrent.FutureTask.run(FutureTask.java:266)
> 2021-03-04T00:17:41.2061472Z  at 
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRunAsync(AkkaRpcActor.java:440)
> 2021-03-04T00:17:41.2062265Z  at 
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:208)
> 2021-03-04T00:17:41.2062824Z  at 
> org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:77)
> 2021-03-04T00:17:41.2063375Z  at 
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:158)
> 2021-03-04T00:17:41.2063821Z  at 
> akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26)
> 2021-03-04T00:17:41.2064246Z  at 
> akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21)
> 2021-03-04T00:17:41.2064669Z  at 
> scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123)
> 2021-03-04T00:17:41.2065093Z  at 
> akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:21)
> 2021-03-04T00:17:41.2065537Z  at 
> scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:170)
> 2021-03-04T00:17:41.2065975Z  at 
> scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171)
> 2021-03-04T00:17:41.2066390Z  at 
> scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171)
> 2021-03-04T00:17:41.2066798Z  at 
> akka.actor.Actor$class.aroundReceive(Actor.scala:517)
> 2021-03-04T00:17:41.2067249Z  at 
> akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:225)
> 2021-03-04T00:17:41.2067916Z  at 
> akka.actor.ActorCell.receiveMessage(ActorCell.scala:592)
> 2021-03-04T00:17:41.2068415Z  at 
> akka.actor.ActorCell.invoke(ActorCell.scala:561)
> 2021-03-04T00:17:41.2068785Z  at 
> akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258)
> 2021-03-04T00:17:41.2069166Z  at akka.dispatch.Mailbox.run(Mailbox.scala:225)
> 2021-03-04T00:17:41.2069523Z  at akka.dispatch.Mailbox.exec(Mailbox.scala:235)
> 2021-03-04T00:17:41.2069784Z  ... 4 more
> 2021-03-04T00:17:41.2070383Z Caused by: 
> java.util.concurrent.CompletionException: 
> org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailableException: 
> Slot request bulk is not fulfillable! Could not allocate the required slot 
> within slot request timeout
> 2021-03-04T00:17:41.2071162Z  at 
> java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292)
> 2021-03-04T00:17:41.2071905Z  at 
> java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308)
> 2021-03-04T00:17:41.2072420Z  at 
> java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:607)
> 2021-03-04T00:17:41.2073089Z  at 
> java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:591)
> 2021-03-04T00:17:41.2073462Z  ... 31 more
> 2021-03-04T00:17:41.2073977Z Caused by: 
> org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailableException: 
> Slot request bulk is not fulfillable! Could not allocate the required slot 
> within slot request timeout
> 2021-03-04T00:17:41.2074838Z  at 
> org.apache.flink.runtime.jobmaster.slotpool.PhysicalSlotRequestBulkCheckerImpl.lambda$schedulePendingRequestBulkWithTimestampCheck$0(PhysicalSlotRequestBulkCheckerImpl.java:86)
> 2021-03-04T00:17:41.2075386Z  ... 24 more
> 2021-03-04T00:17:41.2075706Z Caused by: 
> java.util.concurrent.TimeoutException: Timeout has occurred: 300000 ms
> 2021-03-04T00:17:41.2076048Z  ... 25 more
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to