Commented inline Romain Manni-Bucau @rmannibucau <https://x.com/rmannibucau> | .NET Blog <https://dotnetbirdie.github.io/> | Blog <https://rmannibucau.github.io/> | Old Blog <http://rmannibucau.wordpress.com> | Github <https://github.com/rmannibucau> | LinkedIn <https://www.linkedin.com/in/rmannibucau> | Book <https://www.packtpub.com/en-us/product/java-ee-8-high-performance-9781788473064> Javaccino founder (Java/.NET service - contact via linkedin)
Le jeu. 12 févr. 2026 à 21:13, Steve Loughran <[email protected]> a écrit : > > > you get all thread local stats for a specific thread > from IOStatisticsContext.getCurrentIOStatisticsContext().getIOStatistics() > How is it supposed to work, my understanding is that it is basically a thread local like impl based on a map - important point being it works in the same bound thread - whereas the data is pulled from the sink in a scheduled executor thread so I would still need to do my registry/sync it with spark metrics system no? > > take a snapshot and that and you have something json marshallable or java > serializable which aggregates nicely > > Call IOStatisticsContext.getCurrentIOStatisticsContext().reset() when > your worker thread starts a specific task to ensure you only get the stats > for that task (s3a & I think gcs). > Do you mean impl my own S3A or file io? This is the instrumentation I tried to avoid since I think it should be built-in, not in apps. > > from the fs you getIOStatistics() and you get all the stats of all > filesystems and streams after close(). which from a quick look at some s3 > io to a non-aws store shows a couple of failures, interestingly enough. We > collect separate averages for success and failure on every op so you can > see the difference. > > the JMX stats we collect are a very small subset of the statistics, stuff > like "bytes drained in close" and time to wait for an executor in the > thread pool (action_executor_acquired) are important as they're generally > sign of misconfigurations > Yep, my focus high level is to see if the tuning or tables must be tuned so 429, volume, latencies are key there. > > > 2026-02-12 20:05:24,587 [main] INFO statistics.IOStatisticsLogging > (IOStatisticsLogging.java:logIOStatisticsAtLevel(269)) - IOStatistics: > counters=((action_file_opened=1) > (action_http_get_request=1) > (action_http_head_request=26) > (audit_request_execution=70) > (audit_span_creation=22) > (directories_created=4) > (directories_deleted=2) > (files_copied=2) > (files_copied_bytes=14) > (files_created=1) > (files_deleted=4) > (filesystem_close=1) > (filesystem_initialization=1) > (object_bulk_delete_request=1) > (object_copy_requests=2) > (object_delete_objects=6) > (object_delete_request=4) > (object_list_request=31) > (object_metadata_request=26) > (object_put_bytes=7) > (object_put_request=5) > (object_put_request_completed=5) > (op_create=1) > (op_createfile=2) > (op_createfile.failures=1) > (op_delete=3) > (op_get_file_status=7) > (op_get_file_status.failures=4) > (op_hflush=1) > (op_hsync=1) > (op_list_files=2) > (op_list_files.failures=1) > (op_list_status=2) > (op_list_status.failures=1) > (op_mkdirs=2) > (op_open=1) > (op_rename=2) > (store_client_creation=1) > (store_io_request=70) > (stream_read_bytes=7) > (stream_read_close_operations=1) > (stream_read_closed=1) > (stream_read_opened=1) > (stream_read_operations=1) > (stream_read_remote_stream_drain=1) > (stream_read_seek_policy_changed=1) > (stream_read_total_bytes=7) > (stream_write_block_uploads=2) > (stream_write_bytes=7) > (stream_write_total_data=14) > (stream_write_total_time=290)); > > gauges=(); > > minimums=((action_executor_acquired.min=0) > (action_file_opened.min=136) > (action_http_get_request.min=140) > (action_http_head_request.min=107) > (filesystem_close.min=13) > (filesystem_initialization.min=808) > (object_bulk_delete_request.min=257) > (object_delete_request.min=117) > (object_list_request.min=113) > (object_put_request.min=121) > (op_create.min=148) > (op_createfile.failures.min=111) > (op_delete.min=117) > (op_get_file_status.failures.min=226) > (op_get_file_status.min=1) > (op_list_files.failures.min=391) > (op_list_files.min=138) > (op_list_status.failures.min=458) > (op_list_status.min=1056) > (op_mkdirs.min=709) > (op_rename.min=1205) > (store_client_creation.min=718) > (store_io_rate_limited_duration.min=0) > (stream_read_remote_stream_drain.min=1)); > > maximums=((action_executor_acquired.max=0) > (action_file_opened.max=136) > (action_http_get_request.max=140) > (action_http_head_request.max=270) > (filesystem_close.max=13) > (filesystem_initialization.max=808) > (object_bulk_delete_request.max=257) > (object_delete_request.max=149) > (object_list_request.max=1027) > (object_put_request.max=289) > (op_create.max=148) > (op_createfile.failures.max=111) > (op_delete.max=273) > (op_get_file_status.failures.max=262) > (op_get_file_status.max=254) > (op_list_files.failures.max=391) > (op_list_files.max=138) > (op_list_status.failures.max=458) > (op_list_status.max=1056) > (op_mkdirs.max=2094) > (op_rename.max=1523) > (store_client_creation.max=718) > (store_io_rate_limited_duration.max=0) > (stream_read_remote_stream_drain.max=1)); > > means=((action_executor_acquired.mean=(samples=1, sum=0, mean=0.0000)) > (action_file_opened.mean=(samples=1, sum=136, mean=136.0000)) > (action_http_get_request.mean=(samples=1, sum=140, mean=140.0000)) > (action_http_head_request.mean=(samples=26, sum=3543, mean=136.2692)) > (filesystem_close.mean=(samples=1, sum=13, mean=13.0000)) > (filesystem_initialization.mean=(samples=1, sum=808, mean=808.0000)) > (object_bulk_delete_request.mean=(samples=1, sum=257, mean=257.0000)) > (object_delete_request.mean=(samples=4, sum=525, mean=131.2500)) > (object_list_request.mean=(samples=31, sum=5651, mean=182.2903)) > (object_put_request.mean=(samples=5, sum=1066, mean=213.2000)) > (op_create.mean=(samples=1, sum=148, mean=148.0000)) > (op_createfile.failures.mean=(samples=1, sum=111, mean=111.0000)) > (op_delete.mean=(samples=3, sum=523, mean=174.3333)) > (op_get_file_status.failures.mean=(samples=4, sum=992, mean=248.0000)) > (op_get_file_status.mean=(samples=3, sum=365, mean=121.6667)) > (op_list_files.failures.mean=(samples=1, sum=391, mean=391.0000)) > (op_list_files.mean=(samples=1, sum=138, mean=138.0000)) > (op_list_status.failures.mean=(samples=1, sum=458, mean=458.0000)) > (op_list_status.mean=(samples=1, sum=1056, mean=1056.0000)) > (op_mkdirs.mean=(samples=2, sum=2803, mean=1401.5000)) > (op_rename.mean=(samples=2, sum=2728, mean=1364.0000)) > (store_client_creation.mean=(samples=1, sum=718, mean=718.0000)) > (store_io_rate_limited_duration.mean=(samples=5, sum=0, mean=0.0000)) > (stream_read_remote_stream_drain.mean=(samples=1, sum=1, mean=1.0000))); > > Anyway, no, S3FileIO doesn't have any of that. Keeps the code simple, > which is in its favour. > Hmm, kind of simple but not prod friendly vs more complex but usable in prod in my land to be honest. Does it mean it will not be enhanced? Another thing I don't get is why not reusing hadoop-aws in spark? It would at least enable to mix datasources more nicely and focus in a single location the work (it is already done). Happy to help next week if you think it is generally interesting if there is a consensus on "how". > > > On Thu, 12 Feb 2026 at 18:40, Romain Manni-Bucau <[email protected]> > wrote: > >> hmm, I'm not sure what you do propose to link it to spark sinks but >> S3AInstrumentation.getMetricSystem().allSources for hadoop-aws and >> MetricsPublisher for iceberg are the "least worse" solution I came with. >> Clearly dirty but more efficient than reinstrumenting the whole stack >> everywhere (pull vs push mode). >> >> Do you mean I should wrap everything to read the thread local every time >> and maintain the registry in spark metricssystem? >> >> Another way to see it is to open JMX when using hadoop-aws, these are the >> graphs I want to get into grafana at some point. >> >> Romain Manni-Bucau >> @rmannibucau <https://x.com/rmannibucau> | .NET Blog >> <https://dotnetbirdie.github.io/> | Blog <https://rmannibucau.github.io/> | >> Old Blog <http://rmannibucau.wordpress.com> | Github >> <https://github.com/rmannibucau> | LinkedIn >> <https://www.linkedin.com/in/rmannibucau> | Book >> <https://www.packtpub.com/en-us/product/java-ee-8-high-performance-9781788473064> >> Javaccino founder (Java/.NET service - contact via linkedin) >> >> >> Le jeu. 12 févr. 2026 à 19:19, Steve Loughran <[email protected]> a >> écrit : >> >>> >>> ok, stream level. >>> >>> No, it's not the same. >>> >>> For those s3a input stream stats, you don't need to go into the s3a >>> internals >>> 1. every source of IOStats implements InputStreamStatistics, which is >>> hadoop-common code >>> 2. in close() s3a input streams update thread level IOStatisticsContext ( >>> https://issues.apache.org/jira/browse/HADOOP-17461 ... some >>> stabilisation so use with Hadoop 3.4.0/Spark 4.0+) >>> >>> The thread stuff is so streams opened and closed in, say, the parquet >>> reader, update stats just for that worker thread even though you never get >>> near the stream instance itself. >>> >>> Regarding iceberg fileio stats, well, maybe someone could add it to the >>> classes. Spark 4+ could think about collecting the stats for each task and >>> aggregating, as that was the original goal. You get that aggregation >>> indirectly on s3a with the s3a committers, similar through abfs, but really >>> spark should just collect and report it itself. >>> >>> >>> On Thu, 12 Feb 2026 at 17:03, Romain Manni-Bucau <[email protected]> >>> wrote: >>> >>>> Hi Steve, >>>> >>>> Do you reference org.apache.iceberg.io.FileIOMetricsContext and >>>> org.apache.hadoop.fs.FileSystem.Statistics.StatisticsData? It misses most >>>> of what I'm looking for (429 to cite a single one). >>>> software.amazon.awssdk.metrics helps a bit but is not sink friendly. >>>> Compared to hadoop-aws usage combining iceberg native and aws s3 client >>>> ones kind of compensate the lack but what I would love to see >>>> is org.apache.hadoop.fs.s3a.S3AInstrumentation and more particularly >>>> org.apache.hadoop.fs.s3a.S3AInstrumentation.InputStreamStatistics#InputStreamStatistics >>>> (I'm mainly reading for my use cases). >>>> >>>> >>>> Romain Manni-Bucau >>>> @rmannibucau <https://x.com/rmannibucau> | .NET Blog >>>> <https://dotnetbirdie.github.io/> | Blog >>>> <https://rmannibucau.github.io/> | Old Blog >>>> <http://rmannibucau.wordpress.com> | Github >>>> <https://github.com/rmannibucau> | LinkedIn >>>> <https://www.linkedin.com/in/rmannibucau> | Book >>>> <https://www.packtpub.com/en-us/product/java-ee-8-high-performance-9781788473064> >>>> Javaccino founder (Java/.NET service - contact via linkedin) >>>> >>>> >>>> Le jeu. 12 févr. 2026 à 15:50, Steve Loughran <[email protected]> a >>>> écrit : >>>> >>>>> >>>>> >>>>> On Thu, 12 Feb 2026 at 10:39, Romain Manni-Bucau < >>>>> [email protected]> wrote: >>>>> >>>>>> Hi all, >>>>>> >>>>>> Is it intended that S3FileIO doesn't wire [aws >>>>>> sdk].ClientOverrideConfiguration.Builder#addMetricPublisher so basically, >>>>>> compared to hadoop-aws you can't retrieve metrics from Spark (or any >>>>>> other >>>>>> engine) and send them to a collector in a centralized manner? >>>>>> Is there another intended way? >>>>>> >>>>> >>>>> already a PR up awaiting review by committers >>>>> https://github.com/apache/iceberg/pull/15122 >>>>> >>>>> >>>>> >>>>>> >>>>>> For plain hadoop-aws the workaround is to use (by reflection) >>>>>> S3AInstrumentation.getMetricsSystem().allSources() and wire it to a >>>>>> spark sink. >>>>>> >>>>> >>>>> The intended way to do it there is to use the IOStatistics API which >>>>> not only lets you at the s3a stats, google cloud collects stuff the same >>>>> way, and there's explicit collection of things per thread for stream read >>>>> and write.... >>>>> >>>>> try setting >>>>> >>>>> fs.iostatistics.logging.level info >>>>> >>>>> to see what gets measured >>>>> >>>>> >>>>>> To be clear I do care about the byte written/read but more >>>>>> importantly about the latency, number of requests, statuses etc. The >>>>>> stats >>>>>> exposed through FileSystem in iceberg are < 10 whereas we should get >> >>>>>> 100 >>>>>> stats (taking hadoop as a ref). >>>>>> >>>>> >>>>> AWS metrics are a very limited sets >>>>> >>>>> software.amazon.awssdk.core.metrics.CoreMetric >>>>> >>>>> The retry count is good here as it measures stuff beneath any >>>>> application code. With the rest signer, it'd make sense to also collect >>>>> signing time, as the RPC call to the signing endpoint would be included. >>>>> >>>>> -steve >>>>> >>>>
