[ 
https://issues.apache.org/jira/browse/FLINK-23611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17404949#comment-17404949
 ] 

Matthias commented on FLINK-23611:
----------------------------------

We ran into a timeout because of the YARN Session cluster's 
[join|https://github.com/XComp/flink/blob/646ff2d36f40704f5dca017b8fffed78bd51b307/flink-yarn-tests/src/test/java/org/apache/flink/yarn/YARNSessionCapacitySchedulerITCase.java#L357]
 call waiting for the thread to finish (see {{jps-traces.0}}):
{code:java}
"main" #1 prio=5 os_prio=0 tid=0x00007fedec00b800 nid=0x52d6 in Object.wait() 
[0x00007fedf5f8b000]
   java.lang.Thread.State: WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)
        at java.lang.Thread.join(Thread.java:1252)
        - locked <0x0000000095e06048> (a 
org.apache.flink.yarn.YarnTestBase$Runner)
        at java.lang.Thread.join(Thread.java:1326)
        at 
org.apache.flink.yarn.YARNSessionCapacitySchedulerITCase.lambda$testVCoresAreSetCorrectlyAndJobManagerHostnameAreShownInWebInterfaceAndDynamicPropertiesAndYarnApplicationNameAndTaskManagerSlots$3(YARNSessionCapacitySchedulerITCase.java:357)
        at 
org.apache.flink.yarn.YARNSessionCapacitySchedulerITCase$$Lambda$492/592858578.run(Unknown
 Source)
        at org.apache.flink.yarn.YarnTestBase.runTest(YarnTestBase.java:288)
        at 
org.apache.flink.yarn.YARNSessionCapacitySchedulerITCase.testVCoresAreSetCorrectlyAndJobManagerHostnameAreShownInWebInterfaceAndDynamicPropertiesAndYarnApplicationNameAndTaskManagerSlots(YARNSessionCapacitySchedulerITCase.java:293)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
[...] {code}
This {{join}} command should have gotten back due to {{sendStop}} call that was 
triggered beforehand.

I have to do another round to double-check but I guess the stop call never 
reached the thread because the previous failure of the job submission Runner 
resetted the System input/output streams which cut off the communication 
between the {{main}} and the YARN Session Cluster Thread as well.

> YARNSessionCapacitySchedulerITCase.testVCoresAreSetCorrectlyAndJobManagerHostnameAreShownInWebInterfaceAndDynamicPropertiesAndYarnApplicationNameAndTaskManagerSlots
>  hangs on azure
> -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: FLINK-23611
>                 URL: https://issues.apache.org/jira/browse/FLINK-23611
>             Project: Flink
>          Issue Type: Bug
>          Components: Deployment / YARN
>    Affects Versions: 1.14.0, 1.12.5
>            Reporter: Xintong Song
>            Assignee: Matthias
>            Priority: Major
>              Labels: pull-request-available, test-stability
>             Fix For: 1.14.0, 1.12.6
>
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=21439&view=logs&j=245e1f2e-ba5b-5570-d689-25ae21e5302f&t=e7f339b2-a7c3-57d9-00af-3712d4b15354&l=28959



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to