steveloughran commented on PR #1187:
URL: https://github.com/apache/parquet-mr/pull/1187#issuecomment-1816906884

   it'd be really nice if somehow there was a way to push hadoop stream IOStats 
here, especially the counters, min, max and mean maps: 
https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/filesystem/iostatistics.html
   
   and its really interesting for s3, azure and gcs clients, where we collect 
stream specific stuff, including things like: bytes discarded in seek, time for 
GET, whether we did a HEAD first, and more. These are collected in a thread 
level, but also include stats from helper threads such as those in async stream 
draining, vector IO...
   
   It'd take a move to hadoop 3.3.1+ to embrace the API, but if there was a way 
for something to publish stats to your metric collector, then maybe something 
could be done
   
   Tip: you can enable a dump of a filesystem's aggregate stats in process 
shutdown for azure and s3a
   ```
   fs.iostatistics.logging.level=info
   ```
   
   ```
   2023-11-17 18:30:28,634 [shutdown-hook-0] INFO  
statistics.IOStatisticsLogging 
(IOStatisticsLogging.java:logIOStatisticsAtLevel(269)) - IOStatistics: 
counters=((action_http_head_request=3)
   (audit_request_execution=15)
   (audit_span_creation=12)
   (object_list_request=12)
   (object_metadata_request=3)
   (op_get_file_status=1)
   (op_glob_status=1)
   (op_list_status=9)
   (store_io_request=15));
   
   gauges=();
   
   minimums=((action_http_head_request.min=22)
   (object_list_request.min=25)
   (op_get_file_status.min=1)
   (op_glob_status.min=9)
   (op_list_status.min=25));
   
   maximums=((action_http_head_request.max=41)
   (object_list_request.max=398)
   (op_get_file_status.max=1)
   (op_glob_status.max=9)
   (op_list_status.max=408));
   
   means=((action_http_head_request.mean=(samples=3, sum=87, mean=29.0000))
   (object_list_request.mean=(samples=12, sum=708, mean=59.0000))
   (op_get_file_status.mean=(samples=1, sum=1, mean=1.0000))
   (op_glob_status.mean=(samples=1, sum=9, mean=9.0000))
   (op_list_status.mean=(samples=9, sum=814, mean=90.4444)));
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to