yihua commented on code in PR #18148:
URL: https://github.com/apache/hudi/pull/18148#discussion_r2785061881
##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metrics/HoodieMetrics.java:
##########
@@ -360,6 +360,15 @@ public void updateArchiveMetrics(long durationInMs, int
numInstantsArchived) {
}
}
+ public void emitRollbackFailure(String exceptionReason) {
+ if (config.isMetricsOn()) {
+ metrics.registerGauge(getMetricsName("rollback", "failure"), 1);
+ if (exceptionReason != null) {
+ metrics.registerGauge(getMetricsName("rollback", exceptionReason), 1);
Review Comment:
Using `exceptionReason` (which is `e.getMessage()`) as part of the metric
name could create unbounded metric cardinality: exception messages are
arbitrary strings that may contain file paths, timestamps, or other variable
content. This can cause memory pressure in the metrics registry over time.
Could you categorize the failure in some way?
##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/BaseHoodieTableServiceClient.java:
##########
@@ -1195,6 +1195,7 @@ public boolean rollback(final String commitInstantTime,
Option<HoodiePendingRoll
throw new HoodieRollbackException("Failed to rollback " +
config.getBasePath() + " commits " + commitInstantTime);
}
} catch (Exception e) {
+ metrics.emitRollbackFailure(e.getMessage());
Review Comment:
+1
##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metrics/HoodieMetrics.java:
##########
@@ -360,6 +360,15 @@ public void updateArchiveMetrics(long durationInMs, int
numInstantsArchived) {
}
}
+ public void emitRollbackFailure(String exceptionReason) {
+ if (config.isMetricsOn()) {
Review Comment:
Have you considered using a counter instead of a gauge here? The existing
pattern for events like this (e.g., `emitConflictResolutionFailed`) uses
`counter.inc()` so that repeated failures accumulate. With a gauge set to `1`,
multiple rollback failures would be indistinguishable from a single one.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]