[ 
https://issues.apache.org/jira/browse/FLINK-3187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15128108#comment-15128108
 ] 

ASF GitHub Bot commented on FLINK-3187:
---------------------------------------

Github user tillrohrmann commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1470#discussion_r51557088
  
    --- Diff: docs/apis/fault_tolerance.md ---
    @@ -193,73 +193,169 @@ state updates) of Flink coupled with bundled sinks:
     
     [Back to top](#top)
     
    +Restart Strategies
    +------------------
     
    -Batch Processing Fault Tolerance (DataSet API)
    -----------------------------------------------
    +Flink supports different restart strategies which control how the jobs are 
restarted in case of a failure.
    +The cluster can be started with a default restart strategy which is always 
used when no job specific restart strategy has been defined.
    +In case that the job is submitted with a restart strategy, this strategy 
overrides the cluster's default setting.
    --- End diff --
    
    Yes, definitely


> Decouple restart strategy from ExecutionGraph
> ---------------------------------------------
>
>                 Key: FLINK-3187
>                 URL: https://issues.apache.org/jira/browse/FLINK-3187
>             Project: Flink
>          Issue Type: Improvement
>    Affects Versions: 1.0.0
>            Reporter: Till Rohrmann
>            Assignee: Till Rohrmann
>            Priority: Minor
>
> Currently, the {{ExecutionGraph}} supports the following restart logic: 
> Whenever a failure occurs and the number of restart attempts aren't depleted, 
> wait for a fixed amount of time and then try to restart. This behaviour can 
> be controlled by the configuration parameters {{execution-retries.default}} 
> and {{execution-retries.delay}}.
> I propose to decouple the restart logic from the {{ExecutionGraph}} a bit by 
> introducing a strategy pattern. That way it would not only allow us to define 
> a job specific restart behaviour but also to implement different restart 
> strategies. Conceivable strategies could be: Fixed timeout restart, 
> exponential backoff restart, partial topology restarts, etc.
> This change is a preliminary step towards having a restart strategy which 
> will scale the parallelism of a job down in case that not enough slots are 
> available.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to