[
https://issues.apache.org/jira/browse/FLINK-7067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16215075#comment-16215075
]
ASF GitHub Bot commented on FLINK-7067:
---------------------------------------
GitHub user uce opened a pull request:
https://github.com/apache/flink/pull/4888
[backport] [FLINK-7067] Resume checkpointing after failed
cancel-job-with-savepoint
This is a backport of #4254. I will merge this as soon as Travis gives the
green light.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/uce/flink 7067-backport
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/flink/pull/4888.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #4888
----
commit 9226c3a15f8037851110fbdecf775cad99da771f
Author: Ufuk Celebi <[email protected]>
Date: 2017-07-04T14:39:02Z
[hotfix] [tests] Reduce visibility of helper class methods
There is no need to make the helper methods public. No other class
should even use this inner test helper invokable.
commit c571929ce476f17d02ee22df0b5170b0eb322c2d
Author: Ufuk Celebi <[email protected]>
Date: 2017-07-04T15:01:32Z
[FLINK-7067] [jobmanager] Resume periodic checkpoints after failed
cancel-job-with-savepoint
Problem: If a cancel-job-with-savepoint request fails, this has an
unintended side effect on the respective job if it has periodic
checkpoints enabled. The periodic checkpoint scheduler is stopped
before triggering the savepoint, but not restarted if a savepoint
fails and the job is not cancelled.
This commit makes sure that the periodic checkpoint scheduler is
restarted iff periodic checkpoints were enabled before.
This closes #4254.
commit 074630a2fbd6dbdc7ff775ee9fb5d46c088dbc6d
Author: Ufuk Celebi <[email protected]>
Date: 2017-10-23T12:42:46Z
[FLINK-7067] [jobmanager] Backport to 1.3
----
> Cancel with savepoint does not restart checkpoint scheduler on failure
> ----------------------------------------------------------------------
>
> Key: FLINK-7067
> URL: https://issues.apache.org/jira/browse/FLINK-7067
> Project: Flink
> Issue Type: Bug
> Components: State Backends, Checkpointing
> Affects Versions: 1.3.1
> Reporter: Ufuk Celebi
> Assignee: Ufuk Celebi
> Priority: Blocker
> Fix For: 1.4.0, 1.3.3
>
>
> The `CancelWithSavepoint` action of the JobManager first stops the checkpoint
> scheduler, then triggers a savepoint, and cancels the job after the savepoint
> completes.
> If the savepoint fails, the command should not have any side effects and we
> don't cancel the job. The issue is that the checkpoint scheduler is not
> restarted though.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)