[
https://issues.apache.org/jira/browse/FLINK-22819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17356280#comment-17356280
]
Matthias commented on FLINK-22819:
----------------------------------
I couldn't get anything specific during my initial investigation (I attached
the test's logs). We don't get any additional YARN logs due to the failure
happening during application deployment. There is a timeout during deployment
as stated in the error messages.
```
23:13:00,816 [ContainersLauncher #0] INFO
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor [] -
launchContainer: [bash,
/__w/3/s/flink-yarn-tests/target/flink-yarn-tests-per-job/flink-yarn-tests-per-job-localDir-nm-0_0/usercache/agent0
23:13:12,994 [ Ping Checker] INFO
org.apache.hadoop.yarn.util.AbstractLivelinessMonitor [] -
Expired:appattempt_1622502732791_0001_000001 Timed out after 20 secs
23:13:12,996 [AsyncDispatcher event handler] INFO
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl []
- Updating application attempt appattempt_1622502732791_0001_000001 with final
state: FAILED, and exit status: -1000
```
Comparing it to a successful test of the same build it appears that there is
some time consumed (5 secs here) for authentication:
```
23:15:52,955 [ContainersLauncher #0] INFO
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor [] -
launchContainer: [bash,
/__w/3/s/flink-yarn-tests/target/flink-yarn-tests-capacityscheduler/flink-yarn-tests-capacityscheduler-localDir-nm-
1_0/usercache/agent03_azpcontainer/appcache/application_1622502943279_0001/container_1622502943279_0001_01_000001/default_container_executor.sh]
23:15:57,954 [Socket Reader #1 for port 44617] INFO
SecurityLogger.org.apache.hadoop.ipc.Server [] - Auth
successful for appattempt_1622502943279_0001_000001 (auth:SIMPLE)
23:15:57,977 [IPC Server handler 0 on 44617] INFO
org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService [] - AM
registration appattempt_1622502943279_0001_000001
```
> YARNFileReplicationITCase fails with "The YARN application unexpectedly
> switched to state FAILED during deployment"
> -------------------------------------------------------------------------------------------------------------------
>
> Key: FLINK-22819
> URL: https://issues.apache.org/jira/browse/FLINK-22819
> Project: Flink
> Issue Type: Bug
> Components: Deployment / YARN
> Affects Versions: 1.13.1
> Reporter: Dawid Wysakowicz
> Assignee: Matthias
> Priority: Major
> Labels: test-stability
> Fix For: 1.14.0
>
> Attachments:
> FLINK-22819-YARNFileReplicationITCase-testPerJobModeWithDefaultFileReplication.log
>
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=18467&view=logs&j=8fd975ef-f478-511d-4997-6f15fe8a1fd3&t=ac0fa443-5d45-5a6b-3597-0310ecc1d2ab&l=32007
> {code}
> May 31 23:14:22
> org.apache.flink.client.deployment.ClusterDeploymentException: Could not
> deploy Yarn job cluster.
> May 31 23:14:22 at
> org.apache.flink.yarn.YarnClusterDescriptor.deployJobCluster(YarnClusterDescriptor.java:481)
> May 31 23:14:22 at
> org.apache.flink.yarn.YARNFileReplicationITCase.deployPerJob(YARNFileReplicationITCase.java:106)
> May 31 23:14:22 at
> org.apache.flink.yarn.YARNFileReplicationITCase.lambda$testPerJobModeWithDefaultFileReplication$1(YARNFileReplicationITCase.java:78)
> May 31 23:14:22 at
> org.apache.flink.yarn.YarnTestBase.runTest(YarnTestBase.java:287)
> May 31 23:14:22 at
> org.apache.flink.yarn.YARNFileReplicationITCase.testPerJobModeWithDefaultFileReplication(YARNFileReplicationITCase.java:78)
> May 31 23:14:22 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
> Method)
> May 31 23:14:22 at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> May 31 23:14:22 at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> May 31 23:14:22 at java.lang.reflect.Method.invoke(Method.java:498)
> May 31 23:14:22 at
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
> May 31 23:14:22 at
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> May 31 23:14:22 at
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
> May 31 23:14:22 at
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
> May 31 23:14:22 at
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
> May 31 23:14:22 at
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
> May 31 23:14:22 at
> org.apache.flink.util.TestNameProvider$1.evaluate(TestNameProvider.java:45)
> May 31 23:14:22 at
> org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
> May 31 23:14:22 at org.junit.rules.RunRules.evaluate(RunRules.java:20)
> May 31 23:14:22 at
> org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
> May 31 23:14:22 at
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
> May 31 23:14:22 at
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
> May 31 23:14:22 at
> org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
> May 31 23:14:22 at
> org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
> May 31 23:14:22 at
> org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
> May 31 23:14:22 at
> org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
> May 31 23:14:22 at
> org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
> May 31 23:14:22 at
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
> May 31 23:14:22 at
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
> May 31 23:14:22 at
> org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48)
> May 31 23:14:22 at
> org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48)
> May 31 23:14:22 at org.junit.rules.RunRules.evaluate(RunRules.java:20)
> May 31 23:14:22 at
> org.junit.runners.ParentRunner.run(ParentRunner.java:363)
> May 31 23:14:22 at
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
> May 31 23:14:22 at
> org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
> May 31 23:14:22 at
> org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238)
> May 31 23:14:22 at
> org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
> May 31 23:14:22 at
> org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384)
> May 31 23:14:22 at
> org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345)
> May 31 23:14:22 at
> org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126)
> May 31 23:14:22 at
> org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418)
> May 31 23:14:22 Caused by:
> org.apache.flink.yarn.YarnClusterDescriptor$YarnDeploymentException: The YARN
> application unexpectedly switched to state FAILED during deployment.
> May 31 23:14:22 Diagnostics from YARN: Application
> application_1622502732791_0001 failed 1 times (global limit =2; local limit
> is =1) due to ApplicationMaster for attempt
> appattempt_1622502732791_0001_000001 timed out. Failing the application.
> May 31 23:14:22 If log aggregation is enabled on your cluster, use this
> command to further investigate the issue:
> May 31 23:14:22 yarn logs -applicationId application_1622502732791_0001
> May 31 23:14:22 at
> org.apache.flink.yarn.YarnClusterDescriptor.startAppMaster(YarnClusterDescriptor.java:1201)
> May 31 23:14:22 at
> org.apache.flink.yarn.YarnClusterDescriptor.deployInternal(YarnClusterDescriptor.java:593)
> May 31 23:14:22 at
> org.apache.flink.yarn.YarnClusterDescriptor.deployJobCluster(YarnClusterDescriptor.java:474)
> May 31 23:14:22 ... 39 more
> {code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)