GitHub user uce opened a pull request:

    https://github.com/apache/flink/pull/4254

    [FLINK-7067] [jobmanager] Fix side effects after failed 
cancel-job-with-savepoint

    If a cancel-job-with-savepoint request fails, this has an unintended side 
effect on the respective job if it has periodic checkpoints enabled. The 
periodic checkpoint scheduler is stopped before triggering the savepoint, but 
not restarted if a savepoint fails and the job is not cancelled.
    
    This fix makes sure that the periodic checkpoint scheduler is restarted iff 
periodic checkpoints were enabled before.
    
    I have the test in a separate commit, because it uses Reflection to update 
a private field with a spied upon instance of the CheckpointCoordinator in 
order to test the expected behaviour. This is super fragile and ugly, but the 
alternatives require a large refactoring (use factories that can be set during 
tests) or don't test this corner case behaviour. The separate commit makes it 
easier to remove/revert it at a future point in time.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/uce/flink 7067-restart_checkpoint_scheduler

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/4254.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #4254
    
----
commit 7294de0ef77a346b7b38d4b3fcdc421f7fd6855b
Author: Ufuk Celebi <[email protected]>
Date:   2017-07-04T14:39:02Z

    [tests] Reduce visibility of helper class methods
    
    There is no need to make the helper methods public. No other class
    should even use this inner test helper invokable.

commit ce924bc146d3cf97e0c5ddcc1ba16610b2fc8d49
Author: Ufuk Celebi <[email protected]>
Date:   2017-07-04T14:53:54Z

    [FLINK-7067] [jobmanager] Add test for cancel-job-with-savepoint side 
effects
    
    I have this test in a separate commit, because it uses Reflection
    to update private field with a spied upon instance of the
    CheckpointCoordinator in order to test the expected behaviour. This
    makes it easier to remove/revert at a future point in time.
    
    This is super fragile and ugly, but the alternatives require a
    large refactoring (use factories that can be set during tests)
    or don't test this corner case behaviour.

commit 94aa444cbd7099d7830e06efe3525a717becb740
Author: Ufuk Celebi <[email protected]>
Date:   2017-07-04T15:01:32Z

    [FLINK-7067] [jobmanager] Fix side effects after failed 
cancel-job-with-savepoint
    
    Problem: If a cancel-job-with-savepoint request fails, this has an
    unintended side effect on the respective job if it has periodic
    checkpoints enabled. The periodic checkpoint scheduler is stopped
    before triggering the savepoint, but not restarted if a savepoint
    fails and the job is not cancelled.
    
    This commit makes sure that the periodic checkpoint scheduler is
    restarted iff periodic checkpoints were enabled before.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

Reply via email to