Martijn Visser created FLINK-40010:
--------------------------------------

             Summary: 
RescaleTimelineITCase.testRescaleTerminatedByNoResourcesOrNoParallelismsChange 
is flaky: requirements-update can miss the in-progress rescale
                 Key: FLINK-40010
                 URL: https://issues.apache.org/jira/browse/FLINK-40010
             Project: Flink
          Issue Type: Bug
          Components: Runtime / Coordination, Tests
    Affects Versions: 2.4.0
            Reporter: Martijn Visser
            Assignee: Martijn Visser


testRescaleTerminatedByNoResourcesOrNoParallelismsChange fails on CI: the 
awaited
terminal reason NO_RESOURCES_OR_PARALLELISMS_CHANGE is never recorded, so the 
wait times out (or, before https://issues.apache.org/jira/browse/FLINK-40009, 
hangs).

https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=76350&view=results
 (leg: test_cron_azure core)
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=76242&view=results
 (leg: test_cron_jdk11 core)

Root cause: NO_RESOURCES_OR_PARALLELISMS_CHANGE is stamped by
DefaultStateTransitionManager only on the rescale tracked when the manager
(re-)enters its Idling phase. With the short shared cooldown, the cooldown can 
elapse
and the manager can reach Idling before the requirements-update RPC is 
processed, so the UPDATE_REQUIREMENT rescale is created after Idling was 
entered and never receives the terminal reason; it stays in-progress until 
teardown cancels it (JOB_CANCELED).

Fix: rebuild the fixture cluster with a cooldown (10s) that comfortably 
outlasts the
synchronous update RPC, so the update is processed in Cooldown and routed back 
through Idling where the reason is stamped. Unlike 
testRescaleTerminatedByResourceRequirementsUpdated (FLINK-39903), this case 
must wait out the whole cooldown before the condition can be met, so the 
cooldown is kept modest (10s) and the wait budget is widened to 60s.

Related: FLINK-39902, FLINK-39903 (sibling races in the same class).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to