[ 
https://issues.apache.org/jira/browse/FLINK-10751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16716657#comment-16716657
 ] 

ASF GitHub Bot commented on FLINK-10751:
----------------------------------------

uce closed pull request #7006: [FLINK-10751] [runtime] Retain checkpoints on 
suspension
URL: https://github.com/apache/flink/pull/7006
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git 
a/flink-runtime/src/main/java/org/apache/flink/runtime/checkpoint/CheckpointProperties.java
 
b/flink-runtime/src/main/java/org/apache/flink/runtime/checkpoint/CheckpointProperties.java
index 07780c201dc..27279523cdd 100644
--- 
a/flink-runtime/src/main/java/org/apache/flink/runtime/checkpoint/CheckpointProperties.java
+++ 
b/flink-runtime/src/main/java/org/apache/flink/runtime/checkpoint/CheckpointProperties.java
@@ -246,7 +246,7 @@ public String toString() {
                        true,  // Delete on success
                        true,  // Delete on cancellation
                        true,  // Delete on failure
-                       true); // Delete on suspension
+                       false); // Retain on suspension
 
        private static final CheckpointProperties 
CHECKPOINT_RETAINED_ON_FAILURE = new CheckpointProperties(
                        false,
@@ -255,7 +255,7 @@ public String toString() {
                        true,  // Delete on success
                        true,  // Delete on cancellation
                        false, // Retain on failure
-                       true); // Delete on suspension
+                       false); // Retain on suspension
 
        private static final CheckpointProperties 
CHECKPOINT_RETAINED_ON_CANCELLATION = new CheckpointProperties(
                        false,
@@ -266,7 +266,6 @@ public String toString() {
                        false,  // Retain on failure
                        false); // Retain on suspension
 
-
        /**
         * Creates the checkpoint properties for a (manually triggered) 
savepoint.
         *
diff --git 
a/flink-runtime/src/test/java/org/apache/flink/runtime/checkpoint/CheckpointPropertiesTest.java
 
b/flink-runtime/src/test/java/org/apache/flink/runtime/checkpoint/CheckpointPropertiesTest.java
index c17172b68dc..255904ea953 100644
--- 
a/flink-runtime/src/test/java/org/apache/flink/runtime/checkpoint/CheckpointPropertiesTest.java
+++ 
b/flink-runtime/src/test/java/org/apache/flink/runtime/checkpoint/CheckpointPropertiesTest.java
@@ -42,7 +42,7 @@ public void testCheckpointProperties() {
                assertTrue(props.discardOnJobFinished());
                assertTrue(props.discardOnJobCancelled());
                assertFalse(props.discardOnJobFailed());
-               assertTrue(props.discardOnJobSuspended());
+               assertFalse(props.discardOnJobSuspended());
 
                props = 
CheckpointProperties.forCheckpoint(CheckpointRetentionPolicy.RETAIN_ON_CANCELLATION);
 
diff --git 
a/flink-runtime/src/test/java/org/apache/flink/runtime/checkpoint/StandaloneCompletedCheckpointStoreTest.java
 
b/flink-runtime/src/test/java/org/apache/flink/runtime/checkpoint/StandaloneCompletedCheckpointStoreTest.java
index 6f3c60b5fce..4bb5c291b34 100644
--- 
a/flink-runtime/src/test/java/org/apache/flink/runtime/checkpoint/StandaloneCompletedCheckpointStoreTest.java
+++ 
b/flink-runtime/src/test/java/org/apache/flink/runtime/checkpoint/StandaloneCompletedCheckpointStoreTest.java
@@ -83,8 +83,7 @@ public void testSuspendDiscardsCheckpoints() throws Exception 
{
 
                store.shutdown(JobStatus.SUSPENDED);
                assertEquals(0, store.getNumberOfRetainedCheckpoints());
-               assertTrue(checkpoint.isDiscarded());
-               verifyCheckpointDiscarded(taskStates);
+               assertFalse(checkpoint.isDiscarded());
        }
        
        /**


 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


> Checkpoints should be retained when job reaches suspended state
> ---------------------------------------------------------------
>
>                 Key: FLINK-10751
>                 URL: https://issues.apache.org/jira/browse/FLINK-10751
>             Project: Flink
>          Issue Type: Bug
>          Components: Distributed Coordination
>    Affects Versions: 1.6.2, 1.7.0
>            Reporter: Ufuk Celebi
>            Assignee: Ufuk Celebi
>            Priority: Minor
>              Labels: pull-request-available
>
> {{CheckpointProperties}} define in which terminal job status a checkpoint 
> should be disposed.
> I've noticed that the properties for {{CHECKPOINT_NEVER_RETAINED}}, 
> {{CHECKPOINT_RETAINED_ON_FAILURE}} prescribe checkpoint disposal in (locally) 
> terminal job status {{SUSPENDED}}.
> Since a job reaches the {{SUSPENDED}} state when its {{JobMaster}} looses 
> leadership, this would result in the checkpoint to be cleaned up and not 
> being available for recovery by the new leader. Therefore, we should rather 
> retain checkpoints when reachingĀ job status {{SUSPENDED}}.
> *BUT:* Because we special case this terminal state in the only highly 
> available {{CompletedCheckpointStore}} implementation (seeĀ 
> [ZooKeeperCompletedCheckpointStore|https://github.com/apache/flink/blob/e7ac3ba/flink-runtime/src/main/java/org/apache/flink/runtime/checkpoint/ZooKeeperCompletedCheckpointStore.java#L315])
>  and don't use regular checkpoint disposal, this issue has not surfaced yet.
> I think we should proactively fix the properties to indicate to retain 
> checkpoints in {{SUSPENDED}} state. We might actually completely remove this 
> case since with this change, all properties will indicate to retain on 
> suspension.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to