GitHub user uce opened a pull request:
https://github.com/apache/flink/pull/4254
[FLINK-7067] [jobmanager] Fix side effects after failed
cancel-job-with-savepoint
If a cancel-job-with-savepoint request fails, this has an unintended side
effect on the respective job if it has periodic checkpoints enabled. The
periodic checkpoint scheduler is stopped before triggering the savepoint, but
not restarted if a savepoint fails and the job is not cancelled.
This fix makes sure that the periodic checkpoint scheduler is restarted iff
periodic checkpoints were enabled before.
I have the test in a separate commit, because it uses Reflection to update
a private field with a spied upon instance of the CheckpointCoordinator in
order to test the expected behaviour. This is super fragile and ugly, but the
alternatives require a large refactoring (use factories that can be set during
tests) or don't test this corner case behaviour. The separate commit makes it
easier to remove/revert it at a future point in time.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/uce/flink 7067-restart_checkpoint_scheduler
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/flink/pull/4254.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #4254
----
commit 7294de0ef77a346b7b38d4b3fcdc421f7fd6855b
Author: Ufuk Celebi <[email protected]>
Date: 2017-07-04T14:39:02Z
[tests] Reduce visibility of helper class methods
There is no need to make the helper methods public. No other class
should even use this inner test helper invokable.
commit ce924bc146d3cf97e0c5ddcc1ba16610b2fc8d49
Author: Ufuk Celebi <[email protected]>
Date: 2017-07-04T14:53:54Z
[FLINK-7067] [jobmanager] Add test for cancel-job-with-savepoint side
effects
I have this test in a separate commit, because it uses Reflection
to update private field with a spied upon instance of the
CheckpointCoordinator in order to test the expected behaviour. This
makes it easier to remove/revert at a future point in time.
This is super fragile and ugly, but the alternatives require a
large refactoring (use factories that can be set during tests)
or don't test this corner case behaviour.
commit 94aa444cbd7099d7830e06efe3525a717becb740
Author: Ufuk Celebi <[email protected]>
Date: 2017-07-04T15:01:32Z
[FLINK-7067] [jobmanager] Fix side effects after failed
cancel-job-with-savepoint
Problem: If a cancel-job-with-savepoint request fails, this has an
unintended side effect on the respective job if it has periodic
checkpoints enabled. The periodic checkpoint scheduler is stopped
before triggering the savepoint, but not restarted if a savepoint
fails and the job is not cancelled.
This commit makes sure that the periodic checkpoint scheduler is
restarted iff periodic checkpoints were enabled before.
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---