Clara Xiong created FLINK-29819:
-----------------------------------
Summary: Record an error event when savepoint fails within grace
period
Key: FLINK-29819
URL: https://issues.apache.org/jira/browse/FLINK-29819
Project: Flink
Issue Type: Improvement
Components: Kubernetes Operator
Reporter: Clara Xiong
As of now, SavepointObserver retries if savepoint fails within grace period
until success or failure happens after the grace period. The grace period is
for each retry. If underlying problem for quick failure is not transient, such
as a mis-configured path or a perisistent storage failure, retries keep going
on without recording any error event.
We should first add logic to record an error event per failed attempt. We can
consider capping the retries if it become a pain for users.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)