GitHub user tillrohrmann opened a pull request:
https://github.com/apache/flink/pull/1470
Introduce RestartStrategy to decouple restarting behaviour from
ExecutionGraph
Disclaimer: This PR is based on PR #1468.
## Description
This PR decouples the restart behaviour from the `ExecutionGraph` by
introducing the strategy pattern `RestartStrategy`. The `RestartStrategy`
encapsulates what will be done in case of a restart.
Currently, the following two `RestartStrategies` are supported:
* `FixedDelayRestartStrategy`: Tries to restart the job a fixed number
times with a fixed waiting time in between. This constitutes the ist state.
* `NoRestartStrategy`: No restart, direct failing of job
Having such a decoupling allows us also to set a restart strategy on a per
job basis. This renders the necessity to restart the cluster in case of
changing the restart delay obsolete. Furthermore, different restart strategies
can be used for concurrently running jobs.
Additionally, it is still possible to define a default restart strategy for
the cluster. This default strategy is used when there is no other
`RestartStrategy` defined for a submitted job.
## API Changes
A `RestartStrategy` can be set using the `setRestartStrategy` method of
`(Stream)ExecutionEnvironment`. It looks as follows:
```
ExecutionEnvironment env = ...
env.setRestartStrategy(RestartStrategies.fixedDelay(
3, // number retry attempts
1000 // retry delay in milliseconds
));
env.setRestartStrategy(RestartStrategies.noRestart());
```
The default restart strategy is configured using the `restart-strategy`
configuration parameter. Depending on the `RestartStrategy` several other
configuration parameters can be set. At the moment only the
`FixedDelayRestartStrategy` takes more parameters. Those are
`restart-strategy.fixed-delay.attempts` and
`restart-strategy.fixed-delay.delay`.
In order to configure the `FixedDelayRestartStrategy` as the default
strategy, insert the following into the `flink-conf.yaml`.
```
restart-strategy: fixed-delay
restart-strategy.fixed-delay.attempts: 3
restart-strategy.fixed-delay.delay: 10 s
```
This PR removes the old configuration parameters
`execution-retries.default` and `execution-retries.delay` and is thus
**API-breaking**.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/tillrohrmann/flink hardensRecovery
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/flink/pull/1470.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #1470
----
commit 754c0c408d92e931218a137f388fb77f51df964a
Author: Till Rohrmann <[email protected]>
Date: 2015-12-15T14:15:12Z
Harmonize config key for number of retries and retry delay
commit dd81da02ca6eaf8e0e38cf4511e26cb553c71f72
Author: Till Rohrmann <[email protected]>
Date: 2015-12-15T16:34:17Z
Add missing param descriptions to FlinkYarnCluster, remove implicit timeout
from ApplicationClient
commit 5e967bf8a9ba066be73905338acfd5deb4894602
Author: Till Rohrmann <[email protected]>
Date: 2015-12-15T16:37:20Z
[FLINK-3184] [timeouts] Set default cluster side timeout to 10 s and the
client side timeout to 60 s.
Adapt Akka failure detector timings to respect new 10 s Akka ask timeout.
Add logging statements to JobClientActor
Introduce separation between client and cluster timeout
Sets the cluster timeout to 10 s and the client timeout to 60 s.
commit bdbd1fa2e012a7fcd847874dd15ae929e08234e7
Author: Till Rohrmann <[email protected]>
Date: 2015-12-17T12:49:10Z
[FLINK-3187] [restart] Introduce RestartStrategy to ExecutionGraph
A RestartStrategy defines how the ExecutionGraph reacts in case of a
restart. Different strategies
are conceivable. For example, no restart, fixed delay restart, exponential
backoff restart, scaling
in/out restart, etc.
commit c25d3cdb088cfb7e6138b47d0712e12c4529a9e0
Author: Till Rohrmann <[email protected]>
Date: 2015-12-17T18:31:49Z
Expose RestartStrategy to user API
This removes the setNumberExecutionRetries and the setDelayBetweenRetries
on the ExecutionEnvironment and
the ExecutionConfig. Instead the more general RestartStrategy can be set.
In order to maintain the
separation between the runtime and api module, one sets a
RestartStrategyConfiguration which is transformed
into a RestartStrategy on the JobManager.
commit f1a118a04f87887afaa85141811618253f920a2b
Author: Till Rohrmann <[email protected]>
Date: 2015-12-18T16:50:08Z
Replace old execution-retries configuration parameters by restart-strategy.
commit 765569fa43b00ea42183cc1f371dd55d66a8949a
Author: Till Rohrmann <[email protected]>
Date: 2015-12-18T17:22:26Z
Add FixedDelayRestartStrategy test case
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---