slfan1989 commented on code in PR #13831: URL: https://github.com/apache/iceberg/pull/13831#discussion_r2287550358
########## flink/v2.0/flink/src/main/java/org/apache/iceberg/flink/maintenance/operator/DeleteFilesProcessor.java: ########## @@ -76,6 +79,10 @@ public void open() throws Exception { taskMetricGroup.counter(TableMaintenanceMetrics.DELETE_FILE_FAILED_COUNTER); this.succeededCounter = taskMetricGroup.counter(TableMaintenanceMetrics.DELETE_FILE_SUCCEEDED_COUNTER); + this.deleteFileTimeMsHistogram = + taskMetricGroup.histogram( + TableMaintenanceMetrics.DELETE_FILE_TIME_MS_HISTOGRAM, + new DescriptiveStatisticsHistogram(1000)); Review Comment: @pvary @Guosmilesmile We are currently investigating the integration of the ODS layer with Iceberg, though we do not yet have a concrete business scenario. Nevertheless, I believe adding a metric for the duration of delete operations is valuable: it introduces very low overhead but can significantly enhance system observability, helping us identify potential bottlenecks and analyze performance trends. For example, if delete times increase abnormally, we can further investigate whether the issue stems from the underlying object storage/HDFS performance or slower HMS responses. In this sense, latency histograms provide a more sensitive signal than simply tracking success/failure counts. Even if deletes succeed, a spike in latency can quickly reveal hidden storage or service bottlenecks, which would otherwise be difficult to detect. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org