[ 
https://issues.apache.org/jira/browse/SPARK-11373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14994525#comment-14994525
 ] 

Steve Loughran commented on SPARK-11373:
----------------------------------------

I have in my head roughly how to do this; in SPARK-1537 I've got more complex 
metrics being collected.

'd have the providers themselves register their metrics; they'd just be given 
the registry and told to do it. I'd do this by adding a new method to the base 
class, {{start(BindingInfo)}}, where {{BindingInfo}} would be a class with 
currently just one entry, "metrics registry". (I'd do it that way so that we 
could add more binding info without breaking plugins in in future).

In {{FsHistoryProvider.start(BindingInfo)}} I'd move all the thread-starting 
code from the constructor.  Starting threads there is trouble, especially for 
subclassing (and yes mock tests). It could also add some new values.

For the {{YarnHistoryProvider}}, I've [already got some 
counters|https://github.com/steveloughran/spark/blob/stevel/feature/SPARK-1537-ATS/yarn/src/history/main/scala/org/apache/spark/deploy/history/yarn/server/YarnHistoryProvider.scala#L212]
 —they're just atomic longs in the class. In the publisher code, I've factored 
out these counters, switched them to Codahale {{Counter}} classes, and then 
[register 
them|https://github.com/steveloughran/spark/blob/stevel/feature/SPARK-1537-publisher/yarn/src/history/main/scala/org/apache/spark/deploy/history/yarn/YarnHistoryService.scala#L1078]

That's what I'd do in the providers: let them make up their own metrics and 
register them.

Now,  the next fun issue is: how to publish this? That is: how to read in the 
config and have the server hook up its metrics? I'd actually like the default 
to just be to use the codahale metrics servlets, as I've found these great for 
functional "metrics first" tests —you manipulate the system and verify the 
metrics notice. The web servlets are trivial. Supporting hooking up to ganglia, 
graphite, systemd, ... etc: I have no idea where to begin

Anyway, If you want to work on this, I'll try to help. I'll certainly help with 
the binding to the providers, and show you how to bind the codahale servlets. 
I'll leave it to you to work out how to do the broader metrics bindings

> Add metrics to the History Server and providers
> -----------------------------------------------
>
>                 Key: SPARK-11373
>                 URL: https://issues.apache.org/jira/browse/SPARK-11373
>             Project: Spark
>          Issue Type: New Feature
>          Components: Spark Core
>    Affects Versions: 1.5.1
>            Reporter: Steve Loughran
>
> The History server doesn't publish metrics about JVM load or anything from 
> the history provider plugins. This means that performance problems from 
> massive job histories aren't visible to management tools, and nor are any 
> provider-generated metrics such as time to load histories, failed history 
> loads, the number of connectivity failures talking to remote services, etc.
> If the history server set up a metrics registry and offered the option to 
> publish its metrics, then management tools could view this data.
> # the metrics registry would need to be passed down to the instantiated 
> {{ApplicationHistoryProvider}}, in order for it to register its metrics.
> # if the codahale metrics servlet were registered under a path such as 
> {{/metrics}}, the values would be visible as HTML and JSON, without the need 
> for management tools.
> # Integration tests could also retrieve the JSON-formatted data and use it as 
> part of the test suites.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to