[ https://issues.apache.org/jira/browse/SPARK-11373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14994525#comment-14994525 ]
Steve Loughran commented on SPARK-11373: ---------------------------------------- I have in my head roughly how to do this; in SPARK-1537 I've got more complex metrics being collected. 'd have the providers themselves register their metrics; they'd just be given the registry and told to do it. I'd do this by adding a new method to the base class, {{start(BindingInfo)}}, where {{BindingInfo}} would be a class with currently just one entry, "metrics registry". (I'd do it that way so that we could add more binding info without breaking plugins in in future). In {{FsHistoryProvider.start(BindingInfo)}} I'd move all the thread-starting code from the constructor. Starting threads there is trouble, especially for subclassing (and yes mock tests). It could also add some new values. For the {{YarnHistoryProvider}}, I've [already got some counters|https://github.com/steveloughran/spark/blob/stevel/feature/SPARK-1537-ATS/yarn/src/history/main/scala/org/apache/spark/deploy/history/yarn/server/YarnHistoryProvider.scala#L212] —they're just atomic longs in the class. In the publisher code, I've factored out these counters, switched them to Codahale {{Counter}} classes, and then [register them|https://github.com/steveloughran/spark/blob/stevel/feature/SPARK-1537-publisher/yarn/src/history/main/scala/org/apache/spark/deploy/history/yarn/YarnHistoryService.scala#L1078] That's what I'd do in the providers: let them make up their own metrics and register them. Now, the next fun issue is: how to publish this? That is: how to read in the config and have the server hook up its metrics? I'd actually like the default to just be to use the codahale metrics servlets, as I've found these great for functional "metrics first" tests —you manipulate the system and verify the metrics notice. The web servlets are trivial. Supporting hooking up to ganglia, graphite, systemd, ... etc: I have no idea where to begin Anyway, If you want to work on this, I'll try to help. I'll certainly help with the binding to the providers, and show you how to bind the codahale servlets. I'll leave it to you to work out how to do the broader metrics bindings > Add metrics to the History Server and providers > ----------------------------------------------- > > Key: SPARK-11373 > URL: https://issues.apache.org/jira/browse/SPARK-11373 > Project: Spark > Issue Type: New Feature > Components: Spark Core > Affects Versions: 1.5.1 > Reporter: Steve Loughran > > The History server doesn't publish metrics about JVM load or anything from > the history provider plugins. This means that performance problems from > massive job histories aren't visible to management tools, and nor are any > provider-generated metrics such as time to load histories, failed history > loads, the number of connectivity failures talking to remote services, etc. > If the history server set up a metrics registry and offered the option to > publish its metrics, then management tools could view this data. > # the metrics registry would need to be passed down to the instantiated > {{ApplicationHistoryProvider}}, in order for it to register its metrics. > # if the codahale metrics servlet were registered under a path such as > {{/metrics}}, the values would be visible as HTML and JSON, without the need > for management tools. > # Integration tests could also retrieve the JSON-formatted data and use it as > part of the test suites. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org