[
https://issues.apache.org/jira/browse/FLINK-28199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17750260#comment-17750260
]
Matthias Pohl commented on FLINK-28199:
---------------------------------------
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=51893&view=logs&j=298e20ef-7951-5965-0e79-ea664ddc435e&t=d4c90338-c843-57b0-3232-10ae74f00347&l=27086
This time, only {{testClusterClientRetrieval}} failed. But the JM process
finished without any issues at {{2023-08-02 03:21:25,901}}. The cleanup is
triggered in the test. But the application wasn't cleared:
{code}
Aug 02 03:22:02 [ERROR]
org.apache.flink.yarn.YARNHighAvailabilityITCase.testClusterClientRetrieval
Time elapsed: 29.494 s <<< FAILURE!
Aug 02 03:22:02 java.lang.AssertionError: There is at least one application on
the cluster that is not finished.[App application_1690946369165_0003 is in
state RUNNING.]
Aug 02 03:22:02 at
org.apache.flink.yarn.YarnTestBase$CleanupYarnApplication.close(YarnTestBase.java:336)
Aug 02 03:22:02 at
org.apache.flink.yarn.YarnTestBase.runTest(YarnTestBase.java:300)
Aug 02 03:22:02 at
org.apache.flink.yarn.YARNHighAvailabilityITCase.testClusterClientRetrieval(YARNHighAvailabilityITCase.java:221)
Aug 02 03:22:02 at
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
[...]
{code}
What about increasing the deadline for shutting down the YARN applications?
It's currently set to 10s (see
[apache/flink:org.apache.flink.yarn.YarnTestBase:310|https://github.com/apache/flink/blob/c8ae39d4ac73f81873e1d8ac37e17c29ae330b23/flink-yarn-tests/src/test/java/org/apache/flink/yarn/YarnTestBase.java#L310]
> Failures on YARNHighAvailabilityITCase.testClusterClientRetrieval and
> YARNHighAvailabilityITCase.testKillYarnSessionClusterEntrypoint
> -------------------------------------------------------------------------------------------------------------------------------------
>
> Key: FLINK-28199
> URL: https://issues.apache.org/jira/browse/FLINK-28199
> Project: Flink
> Issue Type: Bug
> Components: Deployment / YARN
> Affects Versions: 1.16.0
> Reporter: Martijn Visser
> Priority: Major
> Labels: test-stability
>
> {code:java}
> Jun 22 08:57:50 [ERROR] Errors:
> Jun 22 08:57:50 [ERROR]
> YARNHighAvailabilityITCase.testClusterClientRetrieval » Timeout
> testClusterCli...
> Jun 22 08:57:50 [ERROR]
> YARNHighAvailabilityITCase.testKillYarnSessionClusterEntrypoint:156->YarnTestBase.runTest:288->lambda$testKillYarnSessionClusterEntrypoint$0:182->waitForJobTermination:325
> » Execution
> Jun 22 08:57:50 [INFO]
> Jun 22 08:57:50 [ERROR] Tests run: 27, Failures: 0, Errors: 2, Skipped: 0
> {code}
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=37037&view=logs&j=fc5181b0-e452-5c8f-68de-1097947f6483&t=995c650b-6573-581c-9ce6-7ad4cc038461&l=29523
--
This message was sent by Atlassian Jira
(v8.20.10#820010)