[ 
https://issues.apache.org/jira/browse/FLINK-22566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17340817#comment-17340817
 ] 

Matthias commented on FLINK-22566:
----------------------------------

I had some discussion about it with [~fly_in_gis]. The NodeManager logs might 
have been helpful in this case. The NodeManager is in charge of downloading the 
jar's before actually starting the TaskManagers. The NodeManager's logs are 
located on the worker nodes which we haven't accessed so far. I added commits 
to cover that.

The initial idea was to increase the timeout as well. But I didn't increased it 
for now. We might want to understand the issue before increasing the timeout. 
It could be an infrastructure problem. In that case, we increasing the timeout 
would make sense. I'm just afraid that it's a different problem which we're not 
aware of right now. Increasing the timeout in that case would just mask it. I 
rather run into the same problem again hoping to investigate the NodeManager 
logs next time.

> Running Kerberized YARN application on Docker test (default input) fails with 
> no resources
> ------------------------------------------------------------------------------------------
>
>                 Key: FLINK-22566
>                 URL: https://issues.apache.org/jira/browse/FLINK-22566
>             Project: Flink
>          Issue Type: Bug
>          Components: Deployment / YARN
>    Affects Versions: 1.13.0
>            Reporter: Dawid Wysakowicz
>            Assignee: Matthias
>            Priority: Blocker
>              Labels: test-stability
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=17558&view=logs&j=c88eea3b-64a0-564d-0031-9fdcd7b8abee&t=ff888d9b-cd34-53cc-d90f-3e446d355529&l=8745
> {code}
> May 05 01:29:04 Caused by: java.util.concurrent.TimeoutException: Timeout has 
> occurred: 120000 ms
> May 05 01:29:04       at 
> org.apache.flink.runtime.jobmaster.slotpool.PhysicalSlotRequestBulkCheckerImpl.lambda$schedulePendingRequestBulkWithTimestampCheck$0(PhysicalSlotRequestBulkCheckerImpl.java:86)
>  ~[flink-dist_2.11-1.13-SNAPSHOT.jar:1.13-SNAPSHOT]
> May 05 01:29:04       at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> ~[?:1.8.0_292]
> May 05 01:29:04       at 
> java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[?:1.8.0_292]
> May 05 01:29:04       at 
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRunAsync(AkkaRpcActor.java:440)
>  ~[flink-dist_2.11-1.13-SNAPSHOT.jar:1.13-SNAPSHOT]
> May 05 01:29:04       at 
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:208)
>  ~[flink-dist_2.11-1.13-SNAPSHOT.jar:1.13-SNAPSHOT]
> May 05 01:29:04       at 
> org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:77)
>  ~[flink-dist_2.11-1.13-SNAPSHOT.jar:1.13-SNAPSHOT]
> May 05 01:29:04       at 
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:158)
>  ~[flink-dist_2.11-1.13-SNAPSHOT.jar:1.13-SNAPSHOT]
> May 05 01:29:04       at 
> akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26) 
> ~[flink-dist_2.11-1.13-SNAPSHOT.jar:1.13-SNAPSHOT]
> May 05 01:29:04       at 
> akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21) 
> ~[flink-dist_2.11-1.13-SNAPSHOT.jar:1.13-SNAPSHOT]
> May 05 01:29:04       at 
> scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123) 
> ~[flink-dist_2.11-1.13-SNAPSHOT.jar:1.13-SNAPSHOT]
> May 05 01:29:04       at 
> akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:21) 
> ~[flink-dist_2.11-1.13-SNAPSHOT.jar:1.13-SNAPSHOT]
> May 05 01:29:04       at 
> scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:170) 
> ~[flink-dist_2.11-1.13-SNAPSHOT.jar:1.13-SNAPSHOT]
> May 05 01:29:04       at 
> scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) 
> ~[flink-dist_2.11-1.13-SNAPSHOT.jar:1.13-SNAPSHOT]
> May 05 01:29:04       at 
> scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) 
> ~[flink-dist_2.11-1.13-SNAPSHOT.jar:1.13-SNAPSHOT]
> May 05 01:29:04       at 
> akka.actor.Actor$class.aroundReceive(Actor.scala:517) 
> ~[flink-dist_2.11-1.13-SNAPSHOT.jar:1.13-SNAPSHOT]
> May 05 01:29:04       at 
> akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:225) 
> ~[flink-dist_2.11-1.13-SNAPSHOT.jar:1.13-SNAPSHOT]
> May 05 01:29:04       at 
> akka.actor.ActorCell.receiveMessage(ActorCell.scala:592) 
> ~[flink-dist_2.11-1.13-SNAPSHOT.jar:1.13-SNAPSHOT]
> May 05 01:29:04       at akka.actor.ActorCell.invoke(ActorCell.scala:561) 
> ~[flink-dist_2.11-1.13-SNAPSHOT.jar:1.13-SNAPSHOT]
> May 05 01:29:04       at 
> akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258) 
> ~[flink-dist_2.11-1.13-SNAPSHOT.jar:1.13-SNAPSHOT]
> May 05 01:29:04       at akka.dispatch.Mailbox.run(Mailbox.scala:225) 
> ~[flink-dist_2.11-1.13-SNAPSHOT.jar:1.13-SNAPSHOT]
> May 05 01:29:04       at akka.dispatch.Mailbox.exec(Mailbox.scala:235) 
> ~[flink-dist_2.11-1.13-SNAPSHOT.jar:1.13-SNAPSHOT]
> May 05 01:29:04       ... 4 more
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to