[ 
https://issues.apache.org/jira/browse/SPARK-28091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Vanzin resolved SPARK-28091.
------------------------------------
    Fix Version/s: 3.0.0
       Resolution: Fixed

Issue resolved by pull request 24901
[https://github.com/apache/spark/pull/24901]

> Extend Spark metrics system with user-defined metrics using executor plugins
> ----------------------------------------------------------------------------
>
>                 Key: SPARK-28091
>                 URL: https://issues.apache.org/jira/browse/SPARK-28091
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Core
>    Affects Versions: 3.0.0
>            Reporter: Luca Canali
>            Assignee: Luca Canali
>            Priority: Minor
>             Fix For: 3.0.0
>
>
> This proposes to improve Spark instrumentation by adding a hook for 
> user-defined metrics, extending Spark’s Dropwizard/Codahale metrics system.
> The original motivation of this work was to add instrumentation for S3 
> filesystem access metrics by Spark job. Currently, [[ExecutorSource]] 
> instruments HDFS and local filesystem metrics. Rather than extending the code 
> there, we proposes with this JIRA to add a metrics plugin system which is of 
> more flexible and general use.
> Context: The Spark metrics system provides a large variety of metrics, see 
> also , useful to  monitor and troubleshoot Spark workloads. A typical 
> workflow is to sink the metrics to a storage system and build dashboards on 
> top of that.
> Highlights:
>  * The metric plugin system makes it easy to implement instrumentation for S3 
> access by Spark jobs.
>  * The metrics plugin system allows for easy extensions of how Spark collects 
> HDFS-related workload metrics. This is currently done using the Hadoop 
> Filesystem GetAllStatistics method, which is deprecated in recent versions of 
> Hadoop. Recent versions of Hadoop Filesystem recommend using method 
> GetGlobalStorageStatistics, which also provides several additional metrics. 
> GetGlobalStorageStatistics is not available in Hadoop 2.7 (had been 
> introduced in Hadoop 2.8). Using a metric plugin for Spark would allow an 
> easy way to “opt in” using such new API calls for those deploying suitable 
> Hadoop versions.
>  * We also have the use case of adding Hadoop filesystem monitoring for a 
> custom Hadoop compliant filesystem in use in our organization (EOS using the 
> XRootD protocol). The metrics plugin infrastructure makes this easy to do. 
> Others may have similar use cases.
>  * More generally, this method makes it straightforward to plug in Filesystem 
> and other metrics to the Spark monitoring system. Future work on plugin 
> implementation can address extending monitoring to measure usage of external 
> resources (OS, filesystem, network, accelerator cards, etc), that maybe would 
> not normally be considered general enough for inclusion in Apache Spark code, 
> but that can be nevertheless useful for specialized use cases, tests or 
> troubleshooting.
> Implementation:
> The proposed implementation builds on top of the work on Executor Plugin of 
> SPARK-24918 and builds on recent work on extending Spark executor metrics, 
> such as SPARK-25228
> Tests and examples:
> This has been so far manually tested running Spark on YARN and K8S clusters, 
> in particular for monitoring S3 and for extending HDFS instrumentation with 
> the Hadoop Filesystem “GetGlobalStorageStatistics” metrics. Executor metric 
> plugin example and code used for testing are available.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to