Purushottam Sinha created FLINK-39963:
-----------------------------------------
Summary: Runtime:
ExecutionTimeBasedSlowTaskDetectorTest.testBalancedInput flaky
Key: FLINK-39963
URL: https://issues.apache.org/jira/browse/FLINK-39963
Project: Flink
Issue Type: Bug
Components: Runtime / Coordination
Reporter: Purushottam Sinha
Problem
The test fails intermittently in CI with AssertionError: Expected size: 2 but
was: 0, failing the core test module. It is a test-only timing race, not a
production defect.
Evidence
- Assertion fails at ExecutionTimeBasedSlowTaskDetectorTest.java:269
(assertThat(slowTasks).hasSize(2)).
- findSlowTasks reads System.currentTimeMillis()
(ExecutionTimeBasedSlowTaskDetector.java:148) and marks a task slow only if its
execution time is strictly greater than the baseline (median × multiplier).
With createSlowTaskDetector(0.3, 1, 0) and equal input bytes, the two running
tasks must out-age the one finished task.
- The test relies on real elapsed time; when setup, markFinished(), and
findSlowTasks() run within one millisecond tick, running tasks are not strictly
greater than the baseline → 0 detected.
- Observed: Java 11 / Test (core), commit 4902753. CI:
https://github.com/apache/flink/actions/runs/27802684638/job/82280284780
Proposed fix
- Set explicit execution-state timestamps so running tasks deterministically
exceed the finished baseline, instead of relying on wall-clock granularity.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)