[
https://issues.apache.org/jira/browse/FLINK-35288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Keshav Kansal updated FLINK-35288:
----------------------------------
Description:
As per the documentation when using the Fixed Delay Restart Strategy, the
*restart-strategy.fixed-delay.attempts* defines the "The number of times that
Flink retries the execution before the job is declared as failed if has been
set to fixed-delay".
However in reality it is the *maximum-total-task-failures*, i.e. it is possbile
that the job does not even attempt to restart.
This is as per documented in
https://cwiki.apache.org/confluence/display/FLINK/FLIP-1%3A+Fine+Grained+Recovery+from+Task+Failures
If there is an outage at a Sink level, for example Elasticsearch outage, all
the independent tasks might fail and the job will immediately fail without
restart (if restart-strategy.fixed-delay.attempts is set lower or equal to the
parallelism of the sink)
was:
As per the documentation when using the Fixed Delay Restart Strategy, the
*restart-strategy.fixed-delay.attempts* defines the "The number of times that
Flink retries the execution before the job is declared as failed if has been
set to fixed-delay".
However in reality it is the *maximum-total-task-failures*, i.e. it is possbile
that the job does not even attempt to restart.
This is as per documented in
https://cwiki.apache.org/confluence/display/FLINK/FLIP-1%3A+Fine+Grained+Recovery+from+Task+Failures
If there is an outage at a Sink level, for example Elasticsearch outage, all
the independent tasks might fail and the job will immediately fail without
restart if restart-strategy.fixed-delay.attempts is set lower or equal to the
parallelism of the sink.
> Flink Restart Strategy does not work as documented
> --------------------------------------------------
>
> Key: FLINK-35288
> URL: https://issues.apache.org/jira/browse/FLINK-35288
> Project: Flink
> Issue Type: Bug
> Reporter: Keshav Kansal
> Priority: Minor
>
> As per the documentation when using the Fixed Delay Restart Strategy, the
> *restart-strategy.fixed-delay.attempts* defines the "The number of times that
> Flink retries the execution before the job is declared as failed if has been
> set to fixed-delay".
> However in reality it is the *maximum-total-task-failures*, i.e. it is
> possbile that the job does not even attempt to restart.
> This is as per documented in
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-1%3A+Fine+Grained+Recovery+from+Task+Failures
> If there is an outage at a Sink level, for example Elasticsearch outage, all
> the independent tasks might fail and the job will immediately fail without
> restart (if restart-strategy.fixed-delay.attempts is set lower or equal to
> the parallelism of the sink)
--
This message was sent by Atlassian Jira
(v8.20.10#820010)