zentol commented on PR #19968: URL: https://github.com/apache/flink/pull/19968#issuecomment-1157673594
The tests for the savepoint operations are scattered around quite a bit. We unfortunately can't fully cover it in the `StopWithSavepointTest` because that requires an actual execution graph. Creating that ourselves isn't really an option (because there are barely any contracts; everything just relies on existing behavior of the scheduler), and we also lack good test utils. Moving away from the ExecutionGraph, while technically possible, can't be done quickly because so many re-used components expect an execution graph. The waiting for the savepoint completion in `onFailure`/`onGloballyTerminalState` is covered by the newly added cases in `StopWithSavepointTest`. Not accidentally triggering 2 state transitions from the state is now enforced by 71f72cf57d820ed62560f07f62259408e3a18b52; this on it's own would've failed tests in `StopWithSavepointTest`, like `testJobFailedAndSavepointOperationFails`. we likely would've noticed the issue sooner if we had this earlier. As for other pre-existing tests: The `AdaptiveSchedulerTest` contains tests for the proper archiving of errors that occurred during `StopWithSavepoint`. These make sure we don't accidentally drop task failures. The `AdaptiveSchedulerITCase` contains high-level tests for the happy path and certain errors on the TM side. These make sure the savepoint operation does complete if a task failed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
