Ryan Williams created SPARK-5847:
------------------------------------

             Summary: Allow for configuring MetricsSystem's use of app ID to 
namespace all metrics
                 Key: SPARK-5847
                 URL: https://issues.apache.org/jira/browse/SPARK-5847
             Project: Spark
          Issue Type: Improvement
          Components: Spark Core
    Affects Versions: 1.2.1
            Reporter: Ryan Williams
            Priority: Minor


{{MetricsSystem}} [currently prepends the app ID to all 
metrics|https://github.com/apache/spark/blob/c51ab37faddf4ede23243058dfb388e74a192552/core/src/main/scala/org/apache/spark/metrics/MetricsSystem.scala#L131].

When reading Spark metrics in Graphite, I've found this to not always be 
desirable. Graphite is designed to track a mostly-unchanging set of metrics 
over time; it allocates large zeroed-out files for each metric it sees, and [by 
default rate-limits itself from creating many of 
these|https://github.com/graphite-project/carbon/blob/79158ffde5949b4056eb7fdb5e9b6b583fe21ea4/conf/carbon.conf.example#L61-L68].

App-ID namespacing means that Graphite is allocating disk-space for every 
"metric" for every job it sees, when in reality some metrics may correspond to 
others across jobs (e.g. driver JVM stats).

Some common Spark usage flows would be better modeled by e.g. namespacing 
metrics by {{spark.app.name}}, so that successive runs of a given job would 
share "metrics", from a storage perspective as well as allowing for monitoring 
aspects of a job's performance over time / many runs.

There's not likely a one-size-fits-all solution here, so I'd propose allowing 
the metrics config file to allow users to specify whether they'd like metrics 
namespaced by {{spark.app.id}}, {{spark.app.name}}, or some other config param.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to