[
https://issues.apache.org/jira/browse/FLINK-22929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Yun Gao updated FLINK-22929:
----------------------------
Summary: Change the default failover strategy to FixDelayRestartStrategy
for batch jobs (was: Change the default failover strategy to
FixDelayRestartStrategy)
> Change the default failover strategy to FixDelayRestartStrategy for batch jobs
> ------------------------------------------------------------------------------
>
> Key: FLINK-22929
> URL: https://issues.apache.org/jira/browse/FLINK-22929
> Project: Flink
> Issue Type: Improvement
> Components: Runtime / Coordination
> Affects Versions: 1.13.0, 1.13.1
> Reporter: Yun Gao
> Priority: Major
>
> Currently for the default failover strategy:
> # Stream Job without checkpoint: NoRestartStrategy
> # Stream Job with checkpoint: FixDelayRestartStrategy as configured [in
> this
> method|https://github.com/apache/flink/blob/ed6b33d487bccd9fd96607a3fe681ead1912d365/flink-runtime/src/main/java/org/apache/flink/runtime/executiongraph/failover/flip1/RestartBackoffTimeStrategyFactoryLoader.java#L160]
> # Batch Job: NoRestartStrategy
>
> The default failover strategy is reasonable for the stream jobs since without
> checkpoint, the stream job could not restart without paying high costs.
> However, for batch jobs, the failover is handled via persisted intermediate
> result partitions, and users usually expect the batch job could finish
> normally by default (similar to other batch processing system). Thus it seems
> to be more reasonable to make the default failover strategy for the batch
> jobs to be the same the stream job with checkpoint enabled (namely
> FixDelayRestartStrategy).
>
> Some users are also [report the related
> issues.|https://lists.apache.org/thread.html/rc4135e4ab41768f5fc3d4405b980872a6e39d2c0f5c92a744c623732%40%3Cuser.flink.apache.org%3E]
--
This message was sent by Atlassian Jira
(v8.3.4#803005)