Till Rohrmann created FLINK-3187:
------------------------------------

             Summary: Decouple restart strategy from ExecutionGraph
                 Key: FLINK-3187
                 URL: https://issues.apache.org/jira/browse/FLINK-3187
             Project: Flink
          Issue Type: Improvement
    Affects Versions: 1.0.0
            Reporter: Till Rohrmann
            Assignee: Till Rohrmann
            Priority: Minor


Currently, the {{ExecutionGraph}} supports the following restart logic: 
Whenever a failure occurs and the number of restart attempts aren't depleted, 
wait for a fixed amount of time and then try to restart. This behaviour can be 
controlled by the configuration parameters {{execution-retries.default}} and 
{{execution-retries.delay}}.

I propose to decouple the restart logic from the {{ExecutionGraph}} a bit by 
introducing a strategy pattern. That way it would not only allow us to define a 
job specific restart behaviour but also to implement different restart 
strategies. Conceivable strategies could be: Fixed timeout restart, exponential 
backoff restart, partial topology restarts, etc.

This change is a preliminary step towards having a restart strategy which will 
scale the parallelism of a job down in case that not enough slots are available.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to