Feifan Wang created FLINK-27187:
-----------------------------------
Summary: The attemptsPerUpload metric may be lower than it
actually is
Key: FLINK-27187
URL: https://issues.apache.org/jira/browse/FLINK-27187
Project: Flink
Issue Type: Bug
Components: Runtime / State Backends
Reporter: Feifan Wang
The attemptsPerUpload metric in ChangelogStorageMetricGroup indicate
distributions of number of attempts per upload.
In the current implementation, each successful attempt try to update
attemptsPerUpload with its attemptNumber.
But consider this case:
# attempt 1 timeout, then schedule attempt 2
# attempt 1 completed before attempt 2 and update attemptsPerUpload with 1
In fact there are two attempts, but attemptsPerUpload updated with 1.
So, I think we should add "actionAttemptsCount" to
RetryExecutor.RetriableActionAttempt, this field shared across all attempts to
execute the same upload action representing the number of upload attempts. And
completed attempt should use this field update attemptsPerUpload.
How do you think about ? [~ym] , [~roman]
--
This message was sent by Atlassian Jira
(v8.20.1#820001)