1996fanrui opened a new pull request, #23247:
URL: https://github.com/apache/flink/pull/23247
The improvement is still discussing, so I didn't add the test and update the
doc, I will finish them later.
## What is the purpose of the change
Currently, Flink has 3 restart strategies, they are: fixed-delay,
failure-rate and exponential-delay.
The exponential-delay is suitable if a job continues to fail for a period of
time. The fixed-delay and failure-rate has the max attempts mechanism, that
means, the job won't restart and go to fail after the attempt exceeds the
threshold of max attempts.
The max attempts mechanism is reasonable, flink should not or need to
infinitely restart the job if the job keeps failing. However, the
exponential-delay doesn't have the max attempts mechanism.
I propose introducing the
`restart-strategy.exponential-delay.max-attempts-before-reset` to support the
max attempts mechanism for exponential-delay. It means flink won't restart job
if the number of job failures before reset exceeds max-attempts-before-reset
when is exponential-delay is enabled.
## Brief change log
- [FLINK-32895][Scheduler][hotfix] Generate latest strategy string for
ExponentialDelayRestartBackoffTimeStrategy due to some information will be
updated, such as: `currentBackoffMS` and `lastFailureTimestamp`
- [FLINK-32895][Scheduler] Introduce the max attempts for Exponential Delay
Restart Strategy
- Add the option:
`restart-strategy.exponential-delay.max-attempts-before-reset`
- Add the new `exponentialDelayRestart` method in RestartStrategies
## Verifying this change
The improvement is still discussing, so I didn't add the test, I will add
the test later.
## Does this pull request potentially affect one of the following parts:
- Dependencies (does it add or upgrade a dependency): no
- The public API, i.e., is any changed class annotated with
`@Public(Evolving)`: yes
- The serializers: no
- The runtime per-record code paths (performance sensitive): no
- Anything that affects deployment or recovery: JobManager (and its
components), Checkpointing, Kubernetes/Yarn, ZooKeeper: no
- The S3 file system connector: no
## Documentation
- Does this pull request introduce a new feature? it's an improvement for
old feature.
- If yes, how is the feature documented? docs / JavaDocs
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]