[
https://issues.apache.org/jira/browse/FLINK-23611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17404949#comment-17404949
]
Matthias commented on FLINK-23611:
----------------------------------
We ran into a timeout because of the YARN Session cluster's
[join|https://github.com/XComp/flink/blob/646ff2d36f40704f5dca017b8fffed78bd51b307/flink-yarn-tests/src/test/java/org/apache/flink/yarn/YARNSessionCapacitySchedulerITCase.java#L357]
call waiting for the thread to finish (see {{jps-traces.0}}):
{code:java}
"main" #1 prio=5 os_prio=0 tid=0x00007fedec00b800 nid=0x52d6 in Object.wait()
[0x00007fedf5f8b000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
at java.lang.Thread.join(Thread.java:1252)
- locked <0x0000000095e06048> (a
org.apache.flink.yarn.YarnTestBase$Runner)
at java.lang.Thread.join(Thread.java:1326)
at
org.apache.flink.yarn.YARNSessionCapacitySchedulerITCase.lambda$testVCoresAreSetCorrectlyAndJobManagerHostnameAreShownInWebInterfaceAndDynamicPropertiesAndYarnApplicationNameAndTaskManagerSlots$3(YARNSessionCapacitySchedulerITCase.java:357)
at
org.apache.flink.yarn.YARNSessionCapacitySchedulerITCase$$Lambda$492/592858578.run(Unknown
Source)
at org.apache.flink.yarn.YarnTestBase.runTest(YarnTestBase.java:288)
at
org.apache.flink.yarn.YARNSessionCapacitySchedulerITCase.testVCoresAreSetCorrectlyAndJobManagerHostnameAreShownInWebInterfaceAndDynamicPropertiesAndYarnApplicationNameAndTaskManagerSlots(YARNSessionCapacitySchedulerITCase.java:293)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
[...] {code}
This {{join}} command should have gotten back due to {{sendStop}} call that was
triggered beforehand.
I have to do another round to double-check but I guess the stop call never
reached the thread because the previous failure of the job submission Runner
resetted the System input/output streams which cut off the communication
between the {{main}} and the YARN Session Cluster Thread as well.
> YARNSessionCapacitySchedulerITCase.testVCoresAreSetCorrectlyAndJobManagerHostnameAreShownInWebInterfaceAndDynamicPropertiesAndYarnApplicationNameAndTaskManagerSlots
> hangs on azure
> -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
> Key: FLINK-23611
> URL: https://issues.apache.org/jira/browse/FLINK-23611
> Project: Flink
> Issue Type: Bug
> Components: Deployment / YARN
> Affects Versions: 1.14.0, 1.12.5
> Reporter: Xintong Song
> Assignee: Matthias
> Priority: Major
> Labels: pull-request-available, test-stability
> Fix For: 1.14.0, 1.12.6
>
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=21439&view=logs&j=245e1f2e-ba5b-5570-d689-25ae21e5302f&t=e7f339b2-a7c3-57d9-00af-3712d4b15354&l=28959
--
This message was sent by Atlassian Jira
(v8.3.4#803005)