[jira] [Updated] (FLINK-22929) Change the default failover strategy to FixDelayRestartStrategy for batch jobs

Yun Gao (Jira) Tue, 08 Jun 2021 09:07:07 -0700


     [ 
https://issues.apache.org/jira/browse/FLINK-22929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Yun Gao updated FLINK-22929:
----------------------------
    Summary: Change the default failover strategy to FixDelayRestartStrategy 
for batch jobs  (was: Change the default failover strategy to 
FixDelayRestartStrategy)

> Change the default failover strategy to FixDelayRestartStrategy for batch jobs
> ------------------------------------------------------------------------------
>
>                 Key: FLINK-22929
>                 URL: https://issues.apache.org/jira/browse/FLINK-22929
>             Project: Flink
>          Issue Type: Improvement
>          Components: Runtime / Coordination
>    Affects Versions: 1.13.0, 1.13.1
>            Reporter: Yun Gao
>            Priority: Major
>
> Currently for the default failover strategy:
>  # Stream Job without checkpoint: NoRestartStrategy
>  # Stream Job with checkpoint:  FixDelayRestartStrategy as configured  [in 
> this 
> method|https://github.com/apache/flink/blob/ed6b33d487bccd9fd96607a3fe681ead1912d365/flink-runtime/src/main/java/org/apache/flink/runtime/executiongraph/failover/flip1/RestartBackoffTimeStrategyFactoryLoader.java#L160]
>  # Batch Job: NoRestartStrategy
>  
> The default failover strategy is reasonable for the stream jobs since without 
> checkpoint, the stream job could not restart without paying high costs. 
> However, for batch jobs, the failover is handled via persisted intermediate 
> result partitions, and users usually expect the batch job could finish 
> normally by default (similar to other batch processing system). Thus it seems 
> to be more reasonable to make the default failover strategy for the batch 
> jobs to be the same the stream job with checkpoint enabled (namely 
> FixDelayRestartStrategy).
>  
> Some users are also [report the related 
> issues.|https://lists.apache.org/thread.html/rc4135e4ab41768f5fc3d4405b980872a6e39d2c0f5c92a744c623732%40%3Cuser.flink.apache.org%3E]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (FLINK-22929) Change the default failover strategy to FixDelayRestartStrategy for batch jobs

Reply via email to