[ https://issues.apache.org/jira/browse/FLINK-3187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15144402#comment-15144402 ]
ASF GitHub Bot commented on FLINK-3187: --------------------------------------- Github user tillrohrmann commented on the pull request: https://github.com/apache/flink/pull/1470#issuecomment-183268932 Alright, I've reintroduced the old execution attempts and delay configuration values and the API calls at the `ExecutionEnvironment`. The behaviour is now the following: 1. If an explicit `RestartStrategy` is set for a job, it is taken. 2. Otherwise it is checked whether the number of retries and retry delay has been set at the `ExecutionEnvironment`/`ExecutionConfig`. If this is the case, then a `FixedDelayRestartStrategy` is instantiated with these values. 3. If no explicit `RestartStrategy` has been defined for the job, then the default restart strategy of the `JobManager` is used. The default restart strategy is defined the following way: 3.1. If the configuration contains a configuration value `restart-strategy`, then this defines the used `RestartStrategy`. 3.2. If `restart-strategy` is not set, then the old `execution-retries.default` and `execution-retries.delay` configuration values are checked. If they are set with `execution-retries.default > 0` and `execution-retries.delay >= 0`, then a `FixedDelayRestartStrategy` is instantiated with the respective values. This is then used as the default restart strategy. If these values are not defined, then a `NoRestartStrategy` is instantiated. This should be not API breaking unless people used the `setExecutionRetries` at the `JobGraph` or the `Plan`. > Decouple restart strategy from ExecutionGraph > --------------------------------------------- > > Key: FLINK-3187 > URL: https://issues.apache.org/jira/browse/FLINK-3187 > Project: Flink > Issue Type: Improvement > Affects Versions: 1.0.0 > Reporter: Till Rohrmann > Assignee: Till Rohrmann > Priority: Minor > > Currently, the {{ExecutionGraph}} supports the following restart logic: > Whenever a failure occurs and the number of restart attempts aren't depleted, > wait for a fixed amount of time and then try to restart. This behaviour can > be controlled by the configuration parameters {{execution-retries.default}} > and {{execution-retries.delay}}. > I propose to decouple the restart logic from the {{ExecutionGraph}} a bit by > introducing a strategy pattern. That way it would not only allow us to define > a job specific restart behaviour but also to implement different restart > strategies. Conceivable strategies could be: Fixed timeout restart, > exponential backoff restart, partial topology restarts, etc. > This change is a preliminary step towards having a restart strategy which > will scale the parallelism of a job down in case that not enough slots are > available. -- This message was sent by Atlassian JIRA (v6.3.4#6332)