Hi Steve, Do you reference org.apache.iceberg.io.FileIOMetricsContext and org.apache.hadoop.fs.FileSystem.Statistics.StatisticsData? It misses most of what I'm looking for (429 to cite a single one). software.amazon.awssdk.metrics helps a bit but is not sink friendly. Compared to hadoop-aws usage combining iceberg native and aws s3 client ones kind of compensate the lack but what I would love to see is org.apache.hadoop.fs.s3a.S3AInstrumentation and more particularly org.apache.hadoop.fs.s3a.S3AInstrumentation.InputStreamStatistics#InputStreamStatistics (I'm mainly reading for my use cases).
Romain Manni-Bucau @rmannibucau <https://x.com/rmannibucau> | .NET Blog <https://dotnetbirdie.github.io/> | Blog <https://rmannibucau.github.io/> | Old Blog <http://rmannibucau.wordpress.com> | Github <https://github.com/rmannibucau> | LinkedIn <https://www.linkedin.com/in/rmannibucau> | Book <https://www.packtpub.com/en-us/product/java-ee-8-high-performance-9781788473064> Javaccino founder (Java/.NET service - contact via linkedin) Le jeu. 12 févr. 2026 à 15:50, Steve Loughran <[email protected]> a écrit : > > > On Thu, 12 Feb 2026 at 10:39, Romain Manni-Bucau <[email protected]> > wrote: > >> Hi all, >> >> Is it intended that S3FileIO doesn't wire [aws >> sdk].ClientOverrideConfiguration.Builder#addMetricPublisher so basically, >> compared to hadoop-aws you can't retrieve metrics from Spark (or any other >> engine) and send them to a collector in a centralized manner? >> Is there another intended way? >> > > already a PR up awaiting review by committers > https://github.com/apache/iceberg/pull/15122 > > > >> >> For plain hadoop-aws the workaround is to use (by reflection) >> S3AInstrumentation.getMetricsSystem().allSources() and wire it to a >> spark sink. >> > > The intended way to do it there is to use the IOStatistics API which not > only lets you at the s3a stats, google cloud collects stuff the same way, > and there's explicit collection of things per thread for stream read and > write.... > > try setting > > fs.iostatistics.logging.level info > > to see what gets measured > > >> To be clear I do care about the byte written/read but more importantly >> about the latency, number of requests, statuses etc. The stats exposed >> through FileSystem in iceberg are < 10 whereas we should get >> 100 stats >> (taking hadoop as a ref). >> > > AWS metrics are a very limited sets > > software.amazon.awssdk.core.metrics.CoreMetric > > The retry count is good here as it measures stuff beneath any application > code. With the rest signer, it'd make sense to also collect signing time, > as the RPC call to the signing endpoint would be included. > > -steve >
