[ https://issues.apache.org/jira/browse/HADOOP-16830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17114069#comment-17114069 ]
Luca Canali commented on HADOOP-16830: -------------------------------------- We find that IO time metrics can be quite useful for debugging, and I wanted to check if that could make sense in the context of this JIRA. As an example, for Apache Spark we have tested with hooking up I/O timing metrics for S3A into Spark's monitoring system (and also for HDFS and other Hadoop compatible filesystems). >From the end-user point of view the result is I/O time instrumenation in a >dashboard together with other Spark's metrics (such as CPU time and run time), >[example|https://www.slideshare.net/databricks/performance-troubleshooting-using-apache-spark-metrics/41] The tested implementation relied on Spark 3.0's new plugin infrastructure [SPARK-29397|https://issues.apache.org/jira/browse/SPARK-29397] that allows to integrate external metrics into Spark instrumentation. Example code of [Spark's plugins to capture Hadoop IO metrics|https://github.com/cerndb/SparkPlugins/tree/master/src/main/scala/ch/cern/experimental] Proof of concept [implementation of some read time metrics for S3A|https://github.com/LucaCanali/hadoop/commit/4ed077061e5826711307941dd397250e2afc47a2] > Add public IOStatistics API; S3A to support > ------------------------------------------- > > Key: HADOOP-16830 > URL: https://issues.apache.org/jira/browse/HADOOP-16830 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs, fs/s3 > Affects Versions: 3.3.0 > Reporter: Steve Loughran > Assignee: Steve Loughran > Priority: Major > > Applications like to collect the statistics which specific operations take, > by collecting exactly those operations done during the execution of FS API > calls by their individual worker threads, and returning these to their job > driver > * S3A has a statistics API for some streams, but it's a non-standard one; > Impala &c can't use it > * FileSystem storage statistics are public, but as they aren't cross-thread, > they don't aggregate properly > Proposed > # A new IOStatistics interface to serve up statistics > # S3A to implement > # other stores to follow > # Pass-through from the usual wrapper classes (FS data input/output streams) > It's hard to think about how best to offer an API for operation context > stats, and how to actually implement. > ThreadLocal isn't enough because the helper threads need to update on the > thread local value of the instigator > My Initial PoC doesn't address that issue, but it shows what I'm thinking of -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org