[ 
https://issues.apache.org/jira/browse/FLINK-24169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias updated FLINK-24169:
-----------------------------
    Description: 
While working on [PR #16989|https://github.com/apache/flink/pull/16989] for 
FLINK-23611, we experienced some flakiness when running 
{{YARNSessionCapacitySchedulerITCase.testDetachedPerJobYarnCluster}} locally.

[~dmvk] discovered a bug in log4j (see 
[LOG4J2-3155|https://issues.apache.org/jira/browse/LOG4J2-3155]). The bug 
affects the test because they check the log files for specific log messages. 
The log messages ends up in the wrong log file if the rolling update mechanism 
is trigger. This does not seem to be an issue on AzureCI due to the slower 
hardware used for the worker machines.

A solution to overcome this issue would be to add a custom log4j configuration 
that disables the {{appender.main.policies.startup.type = 
OnStartupTriggeringPolicy}} which is present in {{flink-dist}}'s log4j 
configuration.

  was:
While working on [PR #16989|https://github.com/apache/flink/pull/16989] for 
FLINK-23611, we experienced some flakiness when running 
{{YARNSessionCapacitySchedulerITCase.testDetachedPerJobYarnCluster}} locally.

[~dmvk] discovered a bug in log4j (see 
[LOG4J2-3155|https://issues.apache.org/jira/browse/LOG4J2-3155]). This does not 
seem to be an issue on AzureCI due to the slower hardware used for the worker 
machines.

A solution to overcome this issue would be to add a custom log4j configuration 
that disables the {{appender.main.policies.startup.type = 
OnStartupTriggeringPolicy}} which is present in {{flink-dist}}'s log4j 
configuration.


> Flaky local YARN tests relying on log files
> -------------------------------------------
>
>                 Key: FLINK-24169
>                 URL: https://issues.apache.org/jira/browse/FLINK-24169
>             Project: Flink
>          Issue Type: Bug
>          Components: Deployment / YARN
>            Reporter: Matthias
>            Priority: Major
>              Labels: test-stability
>
> While working on [PR #16989|https://github.com/apache/flink/pull/16989] for 
> FLINK-23611, we experienced some flakiness when running 
> {{YARNSessionCapacitySchedulerITCase.testDetachedPerJobYarnCluster}} locally.
> [~dmvk] discovered a bug in log4j (see 
> [LOG4J2-3155|https://issues.apache.org/jira/browse/LOG4J2-3155]). The bug 
> affects the test because they check the log files for specific log messages. 
> The log messages ends up in the wrong log file if the rolling update 
> mechanism is trigger. This does not seem to be an issue on AzureCI due to the 
> slower hardware used for the worker machines.
> A solution to overcome this issue would be to add a custom log4j 
> configuration that disables the {{appender.main.policies.startup.type = 
> OnStartupTriggeringPolicy}} which is present in {{flink-dist}}'s log4j 
> configuration.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to