MartijnVisser opened a new pull request, #28406: URL: https://github.com/apache/flink/pull/28406
## What is the purpose of the change `JobMasterTriggerSavepointITCase.testDoNotCancelJobIfSavepointFails` failed in [build 75865](https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=75865) (leg `test_cron_azure tests`): the assertion that `StandaloneResourceManager` logged `Disconnect job manager .*` found only the "Registering/Registered job manager" events. Root cause: the test cancels the job and `waitForDisconnect` waits for the client-visible status to reach `CANCELED`, after which `verifyJobIdIsLogged` asserts the disconnect was logged. The JobMaster disconnects from the ResourceManager asynchronously while shutting down, which happens *after* the job reaches CANCELED. The run logs show the window: job CANCELED at `06:17:51,115`, JobMaster began stopping at `06:17:51,136`, and the verification ran in between. FLINK-37821 fixed an earlier failure of this test on a different signal; this is a distinct race against the RM disconnect logging, tracked in FLINK-39917. ## Brief change log - In `waitForDisconnect`, after the CANCELED wait, additionally wait until the ResourceManager has actually logged the "Disconnect job manager" event before returning. What is asserted does not change. - Factor the disconnect log prefix into a shared constant used by both the new wait predicate and `verifyJobIdIsLogged`'s regex, so the two cannot drift. ## Verifying this change This change is already covered by existing tests: `JobMasterTriggerSavepointITCase` (4 run, 0 failures). ## Does this pull request potentially affect one of the following parts: - Dependencies (does it add or upgrade a dependency): no - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: no - The serializers: no - The runtime per-record code paths (performance sensitive): no - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: no - The S3 file system connector: no ## Documentation - Does this pull request introduce a new feature? no --- ##### Was generative AI tooling used to co-author this PR? - [X] Yes (Claude Opus 4.8 via Claude Code) Generated-by: Claude Opus 4.8 (1M context) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
