[
https://issues.apache.org/jira/browse/SPARK-28091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16866336#comment-16866336
]
Luca Canali edited comment on SPARK-28091 at 6/18/19 8:07 AM:
--------------------------------------------------------------
[~irashid] given your work on SPARK-24918 you may be interested to comment on
this?
was (Author: lucacanali):
@irashid given your work on SPARK-24918 you may be interested to comment on
this?
> Extend Spark metrics system with executor plugin metrics
> --------------------------------------------------------
>
> Key: SPARK-28091
> URL: https://issues.apache.org/jira/browse/SPARK-28091
> Project: Spark
> Issue Type: Improvement
> Components: Spark Core
> Affects Versions: 3.0.0
> Reporter: Luca Canali
> Priority: Minor
>
> This proposes to improve Spark instrumentation by adding a hook for Spark
> executor plugin metrics to the Spark metrics systems implemented with the
> Dropwizard/Codahale library.
> Context: The Spark metrics system provides a large variety of metrics, see
> also SPARK-26890, useful to monitor and troubleshoot Spark workloads. A
> typical workflow is to sink the metrics to a storage system and build
> dashboards on top of that.
> Improvement: The original goal of this work was to add instrumentation for S3
> filesystem access metrics by Spark job. Currently, [[ExecutorSource]]
> instruments HDFS and local filesystem metrics. Rather than extending the code
> there, we proposes to add a metrics plugin system which is of more flexible
> and general use.
> Advantages:
> * The metric plugin system makes it easy to implement instrumentation for S3
> access by Spark jobs.
> * The metrics plugin system allows for easy extensions of how Spark collects
> HDFS-related workload metrics. This is currently done using the Hadoop
> Filesystem GetAllStatistics method, which is deprecated in recent versions of
> Hadoop. Recent versions of Hadoop Filesystem recommend using method
> GetGlobalStorageStatistics, which also provides several additional metrics.
> GetGlobalStorageStatistics is not available in Hadoop 2.7 (had been
> introduced in Hadoop 2.8). Using a metric plugin for Spark would allow an
> easy way to “opt in” using such new API calls for those deploying suitable
> Hadoop versions.
> * We also have the use case of adding Hadoop filesystem monitoring for a
> custom Hadoop compliant filesystem in use in our organization (EOS using the
> XRootD protocol). The metrics plugin infrastructure makes this easy to do.
> Others may have similar use cases.
> * More generally, this method makes it straightforward to plug in Filesystem
> and other metrics to the Spark monitoring system. Future work on plugin
> implementation can address extending monitoring to measure usage of external
> resources (OS, filesystem, network, accelerator cards, etc), that maybe would
> not normally be considered general enough for inclusion in Apache Spark code,
> but that can be nevertheless useful for specialized use cases, tests or
> troubleshooting.
> Implementation:
> The proposed implementation is currently a WIP open for comments and
> improvements. It is based on the work on Executor Plugin of SPARK-24918 and
> builds on recent work on extending Spark executor metrics, such as SPARK-25228
> Tests and examples:
> This has been so far manually tested running Spark on YARN and K8S clusters,
> in particular for monitoring S3 and for extending HDFS instrumentation with
> the Hadoop Filesystem “GetGlobalStorageStatistics” metrics. Executor metric
> plugin example and code used for testing are available.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]