zentol opened a new pull request, #19957: URL: https://github.com/apache/flink/pull/19957
The RunFailedJobListener had rather obscure semantics. It considered a job to be terminal after it was restarted. This is awfully specific to a particular test case. A cleaner approach is just to just cancel the job and wait for it to terminate. Additionally it considered a job as running purely based on the job status, whereas, in particular when checkpointing is involved, waiting for the tasks to be submitted is a better measure. In fact, testExceptionHistoryWithTaskFailureFromStopWithSavepoint was a broken since a savepoint was never triggered, as not all tasks were running. This PR also contains a few cleanup commits. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
