pnowojski commented on a change in pull request #10824:
[FLINK-15152][checkpointing] Restatrt CheckpointCoordinator if
StopWithSavepoint failed
URL: https://github.com/apache/flink/pull/10824#discussion_r365774289
##########
File path:
flink-tests/src/test/java/org/apache/flink/runtime/jobmaster/JobMasterStopWithSavepointIT.java
##########
@@ -194,6 +197,30 @@ private void
throwingExceptionOnCallbackWithRestartsHelper(final boolean termina
assertThat(getJobStatus(),
either(equalTo(JobStatus.CANCELLING)).or(equalTo(JobStatus.CANCELED)));
}
+ @Test
+ public void testRestartCheckpointCoordinatorIfStopWithSavepointFails()
throws Exception {
+ setUpJobGraph(ExceptionOnCallbackStreamTask.class,
RestartStrategies.noRestart());
+
+ try {
+ Files.setPosixFilePermissions(savepointDirectory,
Collections.emptySet());
+ } catch (IOException e) {
+ Assume.assumeNoException(e);
+ }
+
+ try {
+ stopWithSavepoint(true).get();
+ fail();
+ } catch (Exception e) {
+ assertThat(ExceptionUtils.findThrowable(e,
CheckpointException.class).isPresent(), equalTo(true));
+ }
+
+ final JobStatus jobStatus =
clusterClient.getJobStatus(jobGraph.getJobID()).get(60, TimeUnit.SECONDS);
+ assertThat(jobStatus, equalTo(JobStatus.RUNNING));
+ // assert that checkpoints are continued to be triggered
+ checkpointsToWaitFor = new CountDownLatch(1);
Review comment:
I'm not sure if this is correct and whether the newly created
`checkpointsToWaitFor` is guaranteed to be visible by the
`ExceptionOnCallbackStreamTask` instance, as `ExceptionOnCallbackStreamTask`'s
thread is instantiated before this line.
Probably the easiest fix would be to make `checkpointsToWaitFor` `volatile`.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services