[ https://issues.apache.org/jira/browse/SPARK-28091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Xiao Li updated SPARK-28091: ---------------------------- Labels: release-notes (was: ) > Extend Spark metrics system with user-defined metrics using executor plugins > ---------------------------------------------------------------------------- > > Key: SPARK-28091 > URL: https://issues.apache.org/jira/browse/SPARK-28091 > Project: Spark > Issue Type: Improvement > Components: Spark Core > Affects Versions: 3.0.0 > Reporter: Luca Canali > Assignee: Luca Canali > Priority: Minor > Labels: release-notes > Fix For: 3.0.0 > > > This proposes to improve Spark instrumentation by adding a hook for > user-defined metrics, extending Spark’s Dropwizard/Codahale metrics system. > The original motivation of this work was to add instrumentation for S3 > filesystem access metrics by Spark job. Currently, [[ExecutorSource]] > instruments HDFS and local filesystem metrics. Rather than extending the code > there, we proposes with this JIRA to add a metrics plugin system which is of > more flexible and general use. > Context: The Spark metrics system provides a large variety of metrics, see > also , useful to monitor and troubleshoot Spark workloads. A typical > workflow is to sink the metrics to a storage system and build dashboards on > top of that. > Highlights: > * The metric plugin system makes it easy to implement instrumentation for S3 > access by Spark jobs. > * The metrics plugin system allows for easy extensions of how Spark collects > HDFS-related workload metrics. This is currently done using the Hadoop > Filesystem GetAllStatistics method, which is deprecated in recent versions of > Hadoop. Recent versions of Hadoop Filesystem recommend using method > GetGlobalStorageStatistics, which also provides several additional metrics. > GetGlobalStorageStatistics is not available in Hadoop 2.7 (had been > introduced in Hadoop 2.8). Using a metric plugin for Spark would allow an > easy way to “opt in” using such new API calls for those deploying suitable > Hadoop versions. > * We also have the use case of adding Hadoop filesystem monitoring for a > custom Hadoop compliant filesystem in use in our organization (EOS using the > XRootD protocol). The metrics plugin infrastructure makes this easy to do. > Others may have similar use cases. > * More generally, this method makes it straightforward to plug in Filesystem > and other metrics to the Spark monitoring system. Future work on plugin > implementation can address extending monitoring to measure usage of external > resources (OS, filesystem, network, accelerator cards, etc), that maybe would > not normally be considered general enough for inclusion in Apache Spark code, > but that can be nevertheless useful for specialized use cases, tests or > troubleshooting. > Implementation: > The proposed implementation builds on top of the work on Executor Plugin of > SPARK-24918 and builds on recent work on extending Spark executor metrics, > such as SPARK-25228 > Tests and examples: > This has been so far manually tested running Spark on YARN and K8S clusters, > in particular for monitoring S3 and for extending HDFS instrumentation with > the Hadoop Filesystem “GetGlobalStorageStatistics” metrics. Executor metric > plugin example and code used for testing are available. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org