On Thu, 12 Feb 2026 at 10:39, Romain Manni-Bucau <[email protected]> wrote:
> Hi all, > > Is it intended that S3FileIO doesn't wire [aws > sdk].ClientOverrideConfiguration.Builder#addMetricPublisher so basically, > compared to hadoop-aws you can't retrieve metrics from Spark (or any other > engine) and send them to a collector in a centralized manner? > Is there another intended way? > already a PR up awaiting review by committers https://github.com/apache/iceberg/pull/15122 > > For plain hadoop-aws the workaround is to use (by reflection) > S3AInstrumentation.getMetricsSystem().allSources() and wire it to a spark > sink. > The intended way to do it there is to use the IOStatistics API which not only lets you at the s3a stats, google cloud collects stuff the same way, and there's explicit collection of things per thread for stream read and write.... try setting fs.iostatistics.logging.level info to see what gets measured > To be clear I do care about the byte written/read but more importantly > about the latency, number of requests, statuses etc. The stats exposed > through FileSystem in iceberg are < 10 whereas we should get >> 100 stats > (taking hadoop as a ref). > AWS metrics are a very limited sets software.amazon.awssdk.core.metrics.CoreMetric The retry count is good here as it measures stuff beneath any application code. With the rest signer, it'd make sense to also collect signing time, as the RPC call to the signing endpoint would be included. -steve
