[ 
https://issues.apache.org/jira/browse/FLINK-21853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guowei Ma updated FLINK-21853:
------------------------------
    Comment: was deleted

(was: 
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=15601&view=logs&j=4dd4dbdd-1802-5eb7-a518-6acd9d24d0fc&t=8d6b4dd3-4ca1-5611-1743-57a7d76b395a&l=1675

>From the log(flink-vsts-standalonejob-3-fv-az227-245.log) there are failures 
>of submitting the task , which might lead to checkpoint could not success on 
>time.


{code:java}
2021-03-27 22:25:36,398 WARN  
org.apache.flink.runtime.resourcemanager.slotmanager.DeclarativeSlotManager [] 
- Slot allocation for slot 10.1.0.4:43337-a86692_2 for job 
00000000000000000000000000000000 failed.
java.util.concurrent.TimeoutException: Invocation of public abstract 
java.util.concurrent.CompletableFuture 
org.apache.flink.runtime.taskexecutor.TaskExecutorGateway.requestSlot(org.apache.flink.runtime.clusterframework.types.SlotID,org.apache.flink.api.common.JobID,org.apache.flink.runtime.clusterframework.types.AllocationID,org.apache.flink.runtime.clusterframework.types.ResourceProfile,java.lang.String,org.apache.flink.runtime.resourcemanager.ResourceManagerId,org.apache.flink.api.common.time.Time)
 timed out.
        at 
org.apache.flink.runtime.resourcemanager.slotmanager.DeclarativeSlotManager.allocateSlot(DeclarativeSlotManager.java:561)
 ~[flink-dist_2.11-1.13-SNAPSHOT.jar:1.13-SNAPSHOT]
        at 
org.apache.flink.runtime.resourcemanager.slotmanager.DeclarativeSlotManager.internalTryAllocateSlots(DeclarativeSlotManager.java:511)
 ~[flink-dist_2.11-1.13-SNAPSHOT.jar:1.13-SNAPSHOT]
        at 
org.apache.flink.runtime.resourcemanager.slotmanager.DeclarativeSlotManager.tryAllocateSlotsForJob(DeclarativeSlotManager.java:477)
 ~[flink-dist_2.11-1.13-SNAPSHOT.jar:1.13-SNAPSHOT]
        at 
org.apache.flink.runtime.resourcemanager.slotmanager.DeclarativeSlotManager.checkResourceRequirements(DeclarativeSlotManager.java:444)
 ~[flink-dist_2.11-1.13-SNAPSHOT.jar:1.13-SNAPSHOT]
        at 
org.apache.flink.runtime.resourcemanager.slotmanager.DeclarativeSlotManager.unregisterTaskManager(DeclarativeSlotManager.java:346)
 ~[flink-dist_2.11-1.13-SNAPSHOT.jar:1.13-SNAPSHOT]
        at 
org.apache.flink.runtime.resourcemanager.ResourceManager.closeTaskManagerConnection(ResourceManager.java:1078)
 ~[flink-dist_2.11-1.13-SNAPSHOT.jar:1.13-SNAPSHOT]
        at 
org.apache.flink.runtime.resourcemanager.ResourceManager$TaskManagerHeartbeatListener.notifyHeartbeatTimeout(ResourceManager.java:1488)
 ~[flink-dist_2.11-1.13-SNAPSHOT.jar:1.13-SNAPSHOT]
        at 
org.apache.flink.runtime.heartbeat.HeartbeatMonitorImpl.run(HeartbeatMonitorImpl.java:111)
 ~[flink-dist_2.11-1.13-SNAPSHOT.jar:1.13-SNAPSHOT]
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
~[?:1.8.0_282]
        at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
~[?:1.8.0_282]
        at 
org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRunAsync(AkkaRpcActor.java:440)
 ~[flink-dist_2.11-1.13-SNAPSHOT.jar:1.13-SNAPSHOT]
        at 
org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:208)
 ~[flink-dist_2.11-1.13-SNAPSHOT.jar:1.13-SNAPSHOT]
        at 
org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:77)
 ~[flink-dist_2.11-1.13-SNAPSHOT.jar:1.13-SNAPSHOT]
        at 
org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:158)
 ~[flink-dist_2.11-1.13-SNAPSHOT.jar:1.13-SNAPSHOT]
        at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26) 
[flink-dist_2.11-1.13-SNAPSHOT.jar:1.13-SNAPSHOT]
        at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21) 
[flink-dist_2.11-1.13-SNAPSHOT.jar:1.13-SNAPSHOT]
        at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123) 
[flink-dist_2.11-1.13-SNAPSHOT.jar:1.13-SNAPSHOT]
        at akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:21) 
[flink-dist_2.11-1.13-SNAPSHOT.jar:1.13-SNAPSHOT]
        at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:170) 
[flink-dist_2.11-1.13-SNAPSHOT.jar:1.13-SNAPSHOT]
        at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) 
[flink-dist_2.11-1.13-SNAPSHOT.jar:1.13-SNAPSHOT]
        at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) 
[flink-dist_2.11-1.13-SNAPSHOT.jar:1.13-SNAPSHOT]
        at akka.actor.Actor$class.aroundReceive(Actor.scala:517) 
[flink-dist_2.11-1.13-SNAPSHOT.jar:1.13-SNAPSHOT]
        at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:225) 
[flink-dist_2.11-1.13-SNAPSHOT.jar:1.13-SNAPSHOT]
        at akka.actor.ActorCell.receiveMessage(ActorCell.scala:592) 
[flink-dist_2.11-1.13-SNAPSHOT.jar:1.13-SNAPSHOT]
        at akka.actor.ActorCell.invoke(ActorCell.scala:561) 
[flink-dist_2.11-1.13-SNAPSHOT.jar:1.13-SNAPSHOT]
        at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258) 
[flink-dist_2.11-1.13-SNAPSHOT.jar:1.13-SNAPSHOT]
        at akka.dispatch.Mailbox.run(Mailbox.scala:225) 
[flink-dist_2.11-1.13-SNAPSHOT.jar:1.13-SNAPSHOT]
        at akka.dispatch.Mailbox.exec(Mailbox.scala:235) 
[flink-dist_2.11-1.13-SNAPSHOT.jar:1.13-SNAPSHOT]
        at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) 
[flink-dist_2.11-1.13-SNAPSHOT.jar:1.13-SNAPSHOT]
        at 
akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) 
[flink-dist_2.11-1.13-SNAPSHOT.jar:1.13-SNAPSHOT]
        at 
akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) 
[flink-dist_2.11-1.13-SNAPSHOT.jar:1.13-SNAPSHOT]
        at 
akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) 
[flink-dist_2.11-1.13-SNAPSHOT.jar:1.13-SNAPSHOT]
Caused by: akka.pattern.AskTimeoutException: Ask timed out on 
[Actor[akka.tcp://[email protected]:43337/user/rpc/taskmanager_0#-1913811355]] 
after [10000 ms]. Message of type 
[org.apache.flink.runtime.rpc.messages.RemoteRpcInvocation]. A typical reason 
for `AskTimeoutException` is that the recipient actor didn't send a reply.
        at akka.pattern.PromiseActorRef$$anonfun$2.apply(AskSupport.scala:635) 
~[flink-dist_2.11-1.13-SNAPSHOT.jar:1.13-SNAPSHOT]
        at akka.pattern.PromiseActorRef$$anonfun$2.apply(AskSupport.scala:635) 
~[flink-dist_2.11-1.13-SNAPSHOT.jar:1.13-SNAPSHOT]
        at 
akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:648) 
~[flink-dist_2.11-1.13-SNAPSHOT.jar:1.13-SNAPSHOT]
        at akka.actor.Scheduler$$anon$4.run(Scheduler.scala:205) 
~[flink-dist_2.11-1.13-SNAPSHOT.jar:1.13-SNAPSHOT]
        at 
scala.concurrent.Future$InternalCallbackExecutor$.unbatchedExecute(Future.scala:601)
 ~[flink-dist_2.11-1.13-SNAPSHOT.jar:1.13-SNAPSHOT]
        at 
scala.concurrent.BatchingExecutor$class.execute(BatchingExecutor.scala:109) 
~[flink-dist_2.11-1.13-SNAPSHOT.jar:1.13-SNAPSHOT]
        at 
scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:599) 
~[flink-dist_2.11-1.13-SNAPSHOT.jar:1.13-SNAPSHOT]
        at 
akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(LightArrayRevolverScheduler.scala:328)
 ~[flink-dist_2.11-1.13-SNAPSHOT.jar:1.13-SNAPSHOT]
        at 
akka.actor.LightArrayRevolverScheduler$$anon$4.executeBucket$1(LightArrayRevolverScheduler.scala:279)
 ~[flink-dist_2.11-1.13-SNAPSHOT.jar:1.13-SNAPSHOT]
        at 
akka.actor.LightArrayRevolverScheduler$$anon$4.nextTick(LightArrayRevolverScheduler.scala:283)
 ~[flink-dist_2.11-1.13-SNAPSHOT.jar:1.13-SNAPSHOT]
        at 
akka.actor.LightArrayRevolverScheduler$$anon$4.run(LightArrayRevolverScheduler.scala:235)
 ~[flink-dist_2.11-1.13-SNAPSHOT.jar:1.13-SNAPSHOT]
        at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_282]

{code}
)

> Running HA per-job cluster (rocks, non-incremental) end-to-end test could not 
> finished in 900 seconds
> -----------------------------------------------------------------------------------------------------
>
>                 Key: FLINK-21853
>                 URL: https://issues.apache.org/jira/browse/FLINK-21853
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / Coordination, Runtime / State Backends
>    Affects Versions: 1.11.3, 1.13.0
>            Reporter: Guowei Ma
>            Priority: Major
>              Labels: test-stability
>         Attachments: flink-vsts-standalonejob-3-fv-az227-245.log.zip
>
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=14921&view=logs&j=91bf6583-3fb2-592f-e4d4-d79d79c3230a&t=03dbd840-5430-533d-d1a7-05d0ebe03873&l=7318
> {code:java}
> Waiting for text Completed checkpoint [1-9]* for job 
> 00000000000000000000000000000000 to appear 2 of times in logs...
> grep: 
> /home/vsts/work/1/s/flink-dist/target/flink-1.11-SNAPSHOT-bin/flink-1.11-SNAPSHOT/log/*standalonejob-2*.log:
>  No such file or directory
> grep: 
> /home/vsts/work/1/s/flink-dist/target/flink-1.11-SNAPSHOT-bin/flink-1.11-SNAPSHOT/log/*standalonejob-2*.log:
>  No such file or directory
> grep: 
> /home/vsts/work/1/s/flink-dist/target/flink-1.11-SNAPSHOT-bin/flink-1.11-SNAPSHOT/log/*standalonejob-2*.log:
>  No such file or directory
> Starting standalonejob daemon on host fv-az232-135.
> grep: 
> /home/vsts/work/1/s/flink-dist/target/flink-1.11-SNAPSHOT-bin/flink-1.11-SNAPSHOT/log/*standalonejob-2*.log:
>  No such file or directory
> Killed TM @ 15744
> Killed TM @ 19625
> Test (pid: 9232) did not finish after 900 seconds.
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to