[
https://issues.apache.org/jira/browse/FLINK-29618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
ASF GitHub Bot updated FLINK-29618:
-----------------------------------
Labels: pull-request-available starter test-stability (was: starter
test-stability)
> YARNSessionFIFOSecuredITCase.testDetachedMode timed out in Azure CI
> -------------------------------------------------------------------
>
> Key: FLINK-29618
> URL: https://issues.apache.org/jira/browse/FLINK-29618
> Project: Flink
> Issue Type: Bug
> Components: Deployment / YARN, Tests
> Affects Versions: 1.17.0
> Reporter: Matthias Pohl
> Assignee: Wencong Liu
> Priority: Major
> Labels: pull-request-available, starter, test-stability
> Attachments:
> build-20221012.7.YARNSessionFIFOSecuredITCase.testDetachedMode.log
>
>
> We experienced a [build
> failure|https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=41931&view=logs&j=fc5181b0-e452-5c8f-68de-1097947f6483&t=995c650b-6573-581c-9ce6-7ad4cc038461&l=30284]
> that was caused (exclusively) by
> {{YARNSessionFIFOSecuredITCase.testDetachedMode}} running into a timeout.
> The test specific logs which were extracted from the build's are attached to
> this Jira issue.
> JUnit tries to stop the thread running the test but fails to due so because
> it's interrupting a sleep. The {{InterruptedException}} is not properly
> handled in
> [YarnTestBase:744|https://github.com/apache/flink/blob/573ed922346c791760d27653543c2b8df56f51f7/flink-yarn-tests/src/test/java/org/apache/flink/yarn/YarnTestBase.java#L744]
> (it doesn't forward the exception). Therefore, we only see the warning being
> logged after 60s:
> {code}
> 11:33:51,124 [ForkJoinPool-1-worker-25] WARN
> org.apache.flink.yarn.YarnTestBase [] - Interruped
> java.lang.InterruptedException: sleep interrupted
> at java.lang.Thread.sleep(Native Method) ~[?:1.8.0_292]
> at org.apache.flink.yarn.YarnTestBase.sleep(YarnTestBase.java:716)
> ~[test-classes/:?]
> at
> org.apache.flink.yarn.YarnTestBase.startWithArgs(YarnTestBase.java:906)
> ~[test-classes/:?]
> at
> org.apache.flink.yarn.YARNSessionFIFOITCase.runDetachedModeTest(YARNSessionFIFOITCase.java:141)
> ~[test-classes/:?]
> at
> org.apache.flink.yarn.YARNSessionFIFOSecuredITCase.lambda$testDetachedMode$2(YARNSessionFIFOSecuredITCase.java:173)
> ~[test-classes/:?]
> at org.apache.flink.yarn.YarnTestBase.runTest(YarnTestBase.java:288)
> ~[test-classes/:?]
> at
> org.apache.flink.yarn.YARNSessionFIFOSecuredITCase.testDetachedMode(YARNSessionFIFOSecuredITCase.java:160)
> ~[test-classes/:?]
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> ~[?:1.8.0_292]
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> ~[?:1.8.0_292]
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> ~[?:1.8.0_292]
> at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_292]
> [...]
> {code}
> The test code itself eventually continues and succeeds (despite the
> interruption). The job submission takes suspiciously long, though.
> Removing the timeout from the test (as this is the desired approach for tests
> in general now) should solve this test instability.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)