[jira] [Created] (FLINK-22929) Change the default failover strategy to FixDelayRestartStrategy

Yun Gao (Jira) Tue, 08 Jun 2021 09:07:04 -0700

Yun Gao created FLINK-22929:
-------------------------------

             Summary: Change the default failover strategy to 
FixDelayRestartStrategy
                 Key: FLINK-22929
                 URL: https://issues.apache.org/jira/browse/FLINK-22929
             Project: Flink
          Issue Type: Improvement
          Components: Runtime / Coordination
    Affects Versions: 1.13.1, 1.13.0
            Reporter: Yun Gao



Currently for the default failover strategy:
 # Stream Job without checkpoint: NoRestartStrategy
 # Stream Job with checkpoint:  FixDelayRestartStrategy as configured  [in this 
method|https://github.com/apache/flink/blob/ed6b33d487bccd9fd96607a3fe681ead1912d365/flink-runtime/src/main/java/org/apache/flink/runtime/executiongraph/failover/flip1/RestartBackoffTimeStrategyFactoryLoader.java#L160]
 # Batch Job: NoRestartStrategy

 

The default failover strategy is reasonable for the stream jobs since without 
checkpoint, the stream job could not restart without paying high costs. 
However, for batch jobs, the failover is handled via persisted intermediate 
result partitions, and users usually expect the batch job could finish normally 
by default (similar to other batch processing system). Thus it seems to be more 
reasonable to make the default failover strategy for the batch jobs to be the 
same the stream job with checkpoint enabled (namely FixDelayRestartStrategy).

 

Some users are also [report the related 
issues.|https://lists.apache.org/thread.html/rc4135e4ab41768f5fc3d4405b980872a6e39d2c0f5c92a744c623732%40%3Cuser.flink.apache.org%3E]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-22929) Change the default failover strategy to FixDelayRestartStrategy

Reply via email to