GitHub user tillrohrmann opened a pull request:
https://github.com/apache/flink/pull/1923
[FLINK-3800] [jobmanager] Terminate ExecutionGraphs properly
This PR terminates the ExecutionGraphs properly without restarts when the
JobManager calls
cancelAndClearEverything. It is achieved by allowing the method to be only
called with an
SuppressRestartsException. The SuppressRestartsException will disable the
restart strategy of
the respective ExecutionGraph. This is important because the root cause
could be a different
exception. In order to avoid race conditions, the restart strategy has to
be checked twice
whether it allows to restart the job: Once before and once after the job
has transitioned to
the state RESTARTING. This avoids that ExecutionGraphs can become orphans.
Furthermore, this PR fixes the problem that the default restart strategy is
shared by multiple
jobs. The problem is solved by introducing a RestartStrategyFactory which
creates for every
job its own instance of a RestartStrategy.
- [X] General
- The pull request references the related JIRA issue
- The pull request addresses only one issue
- Each commit in the PR has a meaningful commit message
- [X] Tests & Build
- Functionality added by the pull request is covered by tests
- `mvn clean verify` has been executed successfully locally or a Travis
build has passed
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/tillrohrmann/flink fixJobRestart
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/flink/pull/1923.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #1923
----
commit ea05ae102428f6be8db4091b849b680112099c36
Author: Till Rohrmann <[email protected]>
Date: 2016-04-21T15:07:51Z
[FLINK-3800] [jobmanager] Terminate ExecutionGraphs properly
This PR terminates the ExecutionGraphs properly without restarts when the
JobManager calls
cancelAndClearEverything. It is achieved by allowing the method to be only
called with an
SuppressRestartsException. The SuppressRestartsException will disable the
restart strategy of
the respective ExecutionGraph. This is important because the root cause
could be a different
exception. In order to avoid race conditions, the restart strategy has to
be checked twice
whether it allwos to restart the job: Once before and once after the job
has transitioned to
the state RESTARTING. This avoids that ExecutionGraphs can become an orphan.
Furhtermore, this PR fixes the problem that the default restart strategy is
shared by multiple
jobs. The problem is solved by introducing a RestartStrategyFactory which
creates for every
job its own instance of a RestartStrategy.
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---