[jira] [Commented] (FLINK-3187) Decouple restart strategy from ExecutionGraph

ASF GitHub Bot (JIRA) Fri, 12 Feb 2016 02:47:46 -0800

    [ 
https://issues.apache.org/jira/browse/FLINK-3187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15144402#comment-15144402
 ]


ASF GitHub Bot commented on FLINK-3187:
---------------------------------------

Github user tillrohrmann commented on the pull request:

    https://github.com/apache/flink/pull/1470#issuecomment-183268932
  
    Alright, I've reintroduced the old execution attempts and delay 
configuration values and the API calls at the `ExecutionEnvironment`.
    
    The behaviour is now the following: 
    
    1. If an explicit `RestartStrategy` is set for a job, it is taken. 
    
    2. Otherwise it is checked whether the number of retries and retry delay 
has been set at the `ExecutionEnvironment`/`ExecutionConfig`. If this is the 
case, then a `FixedDelayRestartStrategy` is instantiated with these values.  
    
    3. If no explicit `RestartStrategy` has been defined for the job, then the 
default restart strategy of the `JobManager` is used. The default restart 
strategy is defined the following way:
    
    3.1. If the configuration contains a configuration value 
`restart-strategy`, then this defines the used `RestartStrategy`.
    
    3.2. If `restart-strategy` is not set, then the old 
`execution-retries.default` and `execution-retries.delay` configuration values 
are checked. If they are set with `execution-retries.default > 0` and 
`execution-retries.delay >= 0`, then a `FixedDelayRestartStrategy` is 
instantiated with the respective values. This is then used as the default 
restart strategy. If these values are not defined, then a `NoRestartStrategy` 
is instantiated.
    
    This should be not API breaking unless people used the 
`setExecutionRetries` at the `JobGraph` or the `Plan`.


> Decouple restart strategy from ExecutionGraph
> ---------------------------------------------
>
>                 Key: FLINK-3187
>                 URL: https://issues.apache.org/jira/browse/FLINK-3187
>             Project: Flink
>          Issue Type: Improvement
>    Affects Versions: 1.0.0
>            Reporter: Till Rohrmann
>            Assignee: Till Rohrmann
>            Priority: Minor
>
> Currently, the {{ExecutionGraph}} supports the following restart logic: 
> Whenever a failure occurs and the number of restart attempts aren't depleted, 
> wait for a fixed amount of time and then try to restart. This behaviour can 
> be controlled by the configuration parameters {{execution-retries.default}} 
> and {{execution-retries.delay}}.
> I propose to decouple the restart logic from the {{ExecutionGraph}} a bit by 
> introducing a strategy pattern. That way it would not only allow us to define 
> a job specific restart behaviour but also to implement different restart 
> strategies. Conceivable strategies could be: Fixed timeout restart, 
> exponential backoff restart, partial topology restarts, etc.
> This change is a preliminary step towards having a restart strategy which 
> will scale the parallelism of a job down in case that not enough slots are 
> available.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-3187) Decouple restart strategy from ExecutionGraph

Reply via email to