rmetzger opened a new pull request #15804:
URL: https://github.com/apache/flink/pull/15804


   This addresses the following problem in the 
testStopWithSavepointFailOnFirstSavepointSucceedOnSecond() test.
   
   Once all tasks are running, the test triggers a savepoint, which 
intentionally fails, because of a test exception in a Task's checkpointing 
method. The test then waits for the savepoint future to fail, and the scheduler 
to restart the tasks. Once they are running again, it performs a sanity check 
whether the savepoint directory has been properly removed. In the reported run, 
there was still the savepoint directory around.
   
   The savepoint directory is removed via the PendingCheckpoint.discard() 
method. This method is executed using the i/o executor pool of the 
CheckpointCoordinator. There is no guarantee that this discard method has been 
executed when the job is running again (and the executor shuts down with the 
dispatcher, hence it is not bound to job restarts).
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to