[ 
https://issues.apache.org/jira/browse/FLINK-20005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Rohrmann updated FLINK-20005:
----------------------------------
    Fix Version/s: 1.12.0

> "Kerberized YARN application" test unstable
> -------------------------------------------
>
>                 Key: FLINK-20005
>                 URL: https://issues.apache.org/jira/browse/FLINK-20005
>             Project: Flink
>          Issue Type: Bug
>          Components: Deployment / YARN, Runtime / Coordination
>    Affects Versions: 1.12.0
>            Reporter: Robert Metzger
>            Priority: Critical
>              Labels: test-stability
>             Fix For: 1.12.0
>
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=9066&view=logs&j=c88eea3b-64a0-564d-0031-9fdcd7b8abee&t=ff888d9b-cd34-53cc-d90f-3e446d355529
> The {{Running Kerberized YARN application on Docker test (default input)}} is 
> failing.
> These are some exceptions spotted in the logs:
> {code}
> 2020-11-05T14:22:29.3315695Z Nov 05 14:22:29 2020-11-05 14:21:52,696 INFO  
> org.apache.flink.runtime.executiongraph.ExecutionGraph       [] - Flat Map 
> (2/3) (7806b7a7074425c5ff0906befd94e122) switched from SCHEDULED to FAILED on 
> not deployed.
> 2020-11-05T14:22:29.3318307Z Nov 05 14:22:29 
> java.util.concurrent.CompletionException: 
> org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailableException: 
> Slot request bulk is not fulfillable! Could not allocate the required slot 
> within slot request timeout
> 2020-11-05T14:22:29.3320512Z Nov 05 14:22:29  at 
> java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292)
>  ~[?:1.8.0_272]
> 2020-11-05T14:22:29.3322173Z Nov 05 14:22:29  at 
> java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308)
>  ~[?:1.8.0_272]
> 2020-11-05T14:22:29.3323809Z Nov 05 14:22:29  at 
> java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:607) 
> ~[?:1.8.0_272]
> 2020-11-05T14:22:29.3325448Z Nov 05 14:22:29  at 
> java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:591)
>  ~[?:1.8.0_272]
> 2020-11-05T14:22:29.3331094Z Nov 05 14:22:29  at 
> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488)
>  ~[?:1.8.0_272]
> 2020-11-05T14:22:29.3332769Z Nov 05 14:22:29  at 
> java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1990)
>  ~[?:1.8.0_272]
> 2020-11-05T14:22:29.3335736Z Nov 05 14:22:29  at 
> org.apache.flink.runtime.scheduler.SharedSlot.cancelLogicalSlotRequest(SharedSlot.java:195)
>  ~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
> 2020-11-05T14:22:29.3342621Z Nov 05 14:22:29  at 
> org.apache.flink.runtime.scheduler.SlotSharingExecutionSlotAllocator.cancelLogicalSlotRequest(SlotSharingExecutionSlotAllocator.java:147)
>  ~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
> 2020-11-05T14:22:29.3348463Z Nov 05 14:22:29  at 
> org.apache.flink.runtime.scheduler.SharingPhysicalSlotRequestBulk.cancel(SharingPhysicalSlotRequestBulk.java:84)
>  ~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
> 2020-11-05T14:22:29.3353749Z Nov 05 14:22:29  at 
> org.apache.flink.runtime.jobmaster.slotpool.PhysicalSlotRequestBulkWithTimestamp.cancel(PhysicalSlotRequestBulkWithTimestamp.java:66)
>  ~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
> 2020-11-05T14:22:29.3362495Z Nov 05 14:22:29  at 
> org.apache.flink.runtime.jobmaster.slotpool.PhysicalSlotRequestBulkCheckerImpl.lambda$schedulePendingRequestBulkWithTimestampCheck$0(PhysicalSlotRequestBulkCheckerImpl.java:87)
>  ~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
> 2020-11-05T14:22:29.3366937Z Nov 05 14:22:29  at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> ~[?:1.8.0_272]
> 2020-11-05T14:22:29.3370686Z Nov 05 14:22:29  at 
> java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[?:1.8.0_272]
> 2020-11-05T14:22:29.3380715Z Nov 05 14:22:29  at 
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRunAsync(AkkaRpcActor.java:404)
>  ~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
> 2020-11-05T14:22:29.3384436Z Nov 05 14:22:29  at 
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:197)
>  ~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
> 2020-11-05T14:22:29.3387431Z Nov 05 14:22:29  at 
> org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:74)
>  ~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
> 2020-11-05T14:22:29.3390333Z Nov 05 14:22:29  at 
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:154)
>  ~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
> 2020-11-05T14:22:29.3392937Z Nov 05 14:22:29  at 
> akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26) 
> [flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
> 2020-11-05T14:22:29.3395430Z Nov 05 14:22:29  at 
> akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21) 
> [flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
> 2020-11-05T14:22:29.3397949Z Nov 05 14:22:29  at 
> scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123) 
> [flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
> 2020-11-05T14:22:29.3401799Z Nov 05 14:22:29  at 
> akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:21) 
> [flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
> 2020-11-05T14:22:29.3449637Z Nov 05 14:22:29  at 
> scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:170) 
> [flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
> 2020-11-05T14:22:29.3452289Z Nov 05 14:22:29  at 
> scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) 
> [flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
> 2020-11-05T14:22:29.3454833Z Nov 05 14:22:29  at 
> scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) 
> [flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
> 2020-11-05T14:22:29.3458801Z Nov 05 14:22:29  at 
> akka.actor.Actor$class.aroundReceive(Actor.scala:517) 
> [flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
> 2020-11-05T14:22:29.3469564Z Nov 05 14:22:29  at 
> akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:225) 
> [flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
> 2020-11-05T14:22:29.3472736Z Nov 05 14:22:29  at 
> akka.actor.ActorCell.receiveMessage(ActorCell.scala:592) 
> [flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
> 2020-11-05T14:22:29.3475094Z Nov 05 14:22:29  at 
> akka.actor.ActorCell.invoke(ActorCell.scala:561) 
> [flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
> 2020-11-05T14:22:29.3478753Z Nov 05 14:22:29  at 
> akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258) 
> [flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
> 2020-11-05T14:22:29.3497848Z Nov 05 14:22:29  at 
> akka.dispatch.Mailbox.run(Mailbox.scala:225) 
> [flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
> 2020-11-05T14:22:29.3516200Z Nov 05 14:22:29  at 
> akka.dispatch.Mailbox.exec(Mailbox.scala:235) 
> [flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
> 2020-11-05T14:22:29.3519594Z Nov 05 14:22:29  at 
> akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) 
> [flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
> 2020-11-05T14:22:29.3522331Z Nov 05 14:22:29  at 
> akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) 
> [flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
> 2020-11-05T14:22:29.3524990Z Nov 05 14:22:29  at 
> akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) 
> [flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
> 2020-11-05T14:22:29.3528102Z Nov 05 14:22:29  at 
> akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
>  [flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
> 2020-11-05T14:22:29.3530334Z Nov 05 14:22:29 Caused by: 
> org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailableException: 
> Slot request bulk is not fulfillable! Could not allocate the required slot 
> within slot request timeout
> 2020-11-05T14:22:29.3534080Z Nov 05 14:22:29  at 
> org.apache.flink.runtime.jobmaster.slotpool.PhysicalSlotRequestBulkCheckerImpl.lambda$schedulePendingRequestBulkWithTimestampCheck$0(PhysicalSlotRequestBulkCheckerImpl.java:84)
>  ~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
> 2020-11-05T14:22:29.3536451Z Nov 05 14:22:29  ... 24 more
> 2020-11-05T14:22:29.3537535Z Nov 05 14:22:29 Caused by: 
> java.util.concurrent.TimeoutException: Timeout has occurred: 120000 ms
> 2020-11-05T14:22:29.3540969Z Nov 05 14:22:29  at 
> org.apache.flink.runtime.jobmaster.slotpool.PhysicalSlotRequestBulkCheckerImpl.lambda$schedulePendingRequestBulkWithTimestampCheck$0(PhysicalSlotRequestBulkCheckerImpl.java:84)
>  ~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
> 2020-11-05T14:22:29.3542868Z Nov 05 14:22:29  ... 24 more
> {code}
> {code}
> 2020-11-05T14:22:14.3964651Z Nov 05 14:22:13 20/11/05 14:21:55 INFO 
> rmapp.RMAppImpl: application_1604585664395_0001 State change from RUNNING to 
> FINAL_SAVING on event=ATTEMPT_FAILED
> 2020-11-05T14:22:14.3966539Z Nov 05 14:22:13 20/11/05 14:21:55 INFO 
> recovery.RMStateStore: Updating info for app: application_1604585664395_0001
> 2020-11-05T14:22:14.3968255Z Nov 05 14:22:13 20/11/05 14:21:55 INFO 
> capacity.CapacityScheduler: Application Attempt 
> appattempt_1604585664395_0001_000001 is done. finalState=FAILED
> 2020-11-05T14:22:14.3970618Z Nov 05 14:22:13 20/11/05 14:21:55 INFO 
> rmapp.RMAppImpl: Application application_1604585664395_0001 failed 1 times 
> (global limit =2; local limit is =1) due to AM Container for 
> appattempt_1604585664395_0001_000001 exited with  exitCode: 2
> 2020-11-05T14:22:14.3973331Z Nov 05 14:22:13 Failing this 
> attempt.Diagnostics: Exception from container-launch.
> 2020-11-05T14:22:14.3974475Z Nov 05 14:22:13 Container id: 
> container_1604585664395_0001_01_000001
> 2020-11-05T14:22:14.3975384Z Nov 05 14:22:13 Exit code: 2
> 2020-11-05T14:22:14.3976946Z Nov 05 14:22:13 Stack trace: 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.runtime.ContainerExecutionException:
>  Launch container failed
> 2020-11-05T14:22:14.3979115Z Nov 05 14:22:13  at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DefaultLinuxContainerRuntime.launchContainer(DefaultLinuxContainerRuntime.java:112)
> 2020-11-05T14:22:14.3981642Z Nov 05 14:22:13  at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime.launchContainer(DelegatingLinuxContainerRuntime.java:130)
> 2020-11-05T14:22:14.3983756Z Nov 05 14:22:13  at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:395)
> 2020-11-05T14:22:14.3985627Z Nov 05 14:22:13  at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:299)
> 2020-11-05T14:22:14.3987444Z Nov 05 14:22:13  at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:83)
> 2020-11-05T14:22:14.3989017Z Nov 05 14:22:13  at 
> java.util.concurrent.FutureTask.run(FutureTask.java:266)
> 2020-11-05T14:22:14.3990393Z Nov 05 14:22:13  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> 2020-11-05T14:22:14.3991866Z Nov 05 14:22:13  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> 2020-11-05T14:22:14.3993133Z Nov 05 14:22:13  at 
> java.lang.Thread.run(Thread.java:748)
> 2020-11-05T14:22:14.3993947Z Nov 05 14:22:13 
> 2020-11-05T14:22:14.3994706Z Nov 05 14:22:13 Shell output: main : command 
> provided 1
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to