Yun Gao created FLINK-22929:
-------------------------------
Summary: Change the default failover strategy to
FixDelayRestartStrategy
Key: FLINK-22929
URL: https://issues.apache.org/jira/browse/FLINK-22929
Project: Flink
Issue Type: Improvement
Components: Runtime / Coordination
Affects Versions: 1.13.1, 1.13.0
Reporter: Yun Gao
Currently for the default failover strategy:
# Stream Job without checkpoint: NoRestartStrategy
# Stream Job with checkpoint: FixDelayRestartStrategy as configured [in this
method|https://github.com/apache/flink/blob/ed6b33d487bccd9fd96607a3fe681ead1912d365/flink-runtime/src/main/java/org/apache/flink/runtime/executiongraph/failover/flip1/RestartBackoffTimeStrategyFactoryLoader.java#L160]
# Batch Job: NoRestartStrategy
The default failover strategy is reasonable for the stream jobs since without
checkpoint, the stream job could not restart without paying high costs.
However, for batch jobs, the failover is handled via persisted intermediate
result partitions, and users usually expect the batch job could finish normally
by default (similar to other batch processing system). Thus it seems to be more
reasonable to make the default failover strategy for the batch jobs to be the
same the stream job with checkpoint enabled (namely FixDelayRestartStrategy).
Some users are also [report the related
issues.|https://lists.apache.org/thread.html/rc4135e4ab41768f5fc3d4405b980872a6e39d2c0f5c92a744c623732%40%3Cuser.flink.apache.org%3E]
--
This message was sent by Atlassian Jira
(v8.3.4#803005)