[
https://issues.apache.org/jira/browse/FLINK-24169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matthias updated FLINK-24169:
-----------------------------
Description:
While working on [PR #16989|https://github.com/apache/flink/pull/16989] for
FLINK-23611, we experienced some flakiness when running
{{YARNSessionCapacitySchedulerITCase.testDetachedPerJobYarnCluster}} locally.
[~dmvk] discovered a bug in log4j (see
[LOG4J2-3155|https://issues.apache.org/jira/browse/LOG4J2-3155]). The bug
affects the test because they check the log files for specific log messages.
The log messages ends up in the wrong log file if the rolling update mechanism
is trigger. This does not seem to be an issue on AzureCI due to the slower
hardware used for the worker machines.
A solution to overcome this issue would be to add a custom log4j configuration
that disables the {{appender.main.policies.startup.type =
OnStartupTriggeringPolicy}} which is present in {{flink-dist}}'s log4j
configuration.
was:
While working on [PR #16989|https://github.com/apache/flink/pull/16989] for
FLINK-23611, we experienced some flakiness when running
{{YARNSessionCapacitySchedulerITCase.testDetachedPerJobYarnCluster}} locally.
[~dmvk] discovered a bug in log4j (see
[LOG4J2-3155|https://issues.apache.org/jira/browse/LOG4J2-3155]). This does not
seem to be an issue on AzureCI due to the slower hardware used for the worker
machines.
A solution to overcome this issue would be to add a custom log4j configuration
that disables the {{appender.main.policies.startup.type =
OnStartupTriggeringPolicy}} which is present in {{flink-dist}}'s log4j
configuration.
> Flaky local YARN tests relying on log files
> -------------------------------------------
>
> Key: FLINK-24169
> URL: https://issues.apache.org/jira/browse/FLINK-24169
> Project: Flink
> Issue Type: Bug
> Components: Deployment / YARN
> Reporter: Matthias
> Priority: Major
> Labels: test-stability
>
> While working on [PR #16989|https://github.com/apache/flink/pull/16989] for
> FLINK-23611, we experienced some flakiness when running
> {{YARNSessionCapacitySchedulerITCase.testDetachedPerJobYarnCluster}} locally.
> [~dmvk] discovered a bug in log4j (see
> [LOG4J2-3155|https://issues.apache.org/jira/browse/LOG4J2-3155]). The bug
> affects the test because they check the log files for specific log messages.
> The log messages ends up in the wrong log file if the rolling update
> mechanism is trigger. This does not seem to be an issue on AzureCI due to the
> slower hardware used for the worker machines.
> A solution to overcome this issue would be to add a custom log4j
> configuration that disables the {{appender.main.policies.startup.type =
> OnStartupTriggeringPolicy}} which is present in {{flink-dist}}'s log4j
> configuration.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)