Jungtaek Lim created STORM-2909:
-----------------------------------
Summary: New Metrics Reporting API - for 2.0.0
Key: STORM-2909
URL: https://issues.apache.org/jira/browse/STORM-2909
Project: Apache Storm
Issue Type: Improvement
Reporter: P. Taylor Goetz
Assignee: P. Taylor Goetz
This is a proposal to provide a new metrics reporting API based on [Coda Hale's
metrics library | http://metrics.dropwizard.io/3.1.0/] (AKA Dropwizard/Yammer
metrics).
h2. Background
In a [discussion on the dev@ mailing list |
http://mail-archives.apache.org/mod_mbox/storm-dev/201610.mbox/%3ccagx0urh85nfh0pbph11pmc1oof6htycjcxsxgwp2nnofukq...@mail.gmail.com%3e]
a number of community and PMC members recommended replacing Storm’s metrics
system with a new API as opposed to enhancing the existing metrics system. Some
of the objections to the existing metrics API include:
# Metrics are reported as an untyped Java object, making it very difficult to
reason about how to report it (e.g. is it a gauge, a counter, etc.?)
# It is difficult to determine if metrics coming into the consumer are
pre-aggregated or not.
# Storm’s metrics collection occurs through a specialized bolt, which in
addition to potentially affecting system performance, complicates certain types
of aggregation when the parallelism of that bolt is greater than one.
In the discussion on the developer mailing list, there is growing consensus for
replacing Storm’s metrics API with a new API based on Coda Hale’s metrics
library. This approach has the following benefits:
# Coda Hale’s metrics library is very stable, performant, well thought out, and
widely adopted among open source projects (e.g. Kafka).
# The metrics library provides many existing metric types: Meters, Gauges,
Counters, Histograms, and more.
# The library has a pluggable “reporter” API for publishing metrics to various
systems, with existing implementations for: JMX, console, CSV, SLF4J, Graphite,
Ganglia.
# Reporters are straightforward to implement, and can be reused by any project
that uses the metrics library (i.e. would have broader application outside of
Storm)
As noted earlier, the metrics library supports pluggable reporters for sending
metrics data to other systems, and implementing a reporter is fairly
straightforward (an example reporter implementation can be found here). For
example if someone develops a reporter based on Coda Hale’s metrics, it could
not only be used for pushing Storm metrics, but also for any system that used
the metrics library, such as Kafka.
h2. Scope of Effort
The effort to implement a new metrics API for Storm can be broken down into the
following development areas:
# Implement API for Storms internal worker metrics: latencies, queue sizes,
capacity, etc.
# Implement API for user defined, topology-specific metrics (exposed via the
{{org.apache.storm.task.TopologyContext}} class)
# Implement API for storm daemons: nimbus, supervisor, etc.
h2. Relationship to Existing Metrics
This would be a new API that would not affect the existing metrics API. Upon
completion, the old metrics API would presumably be deprecated, but kept in
place for backward compatibility.
Internally the current metrics API uses Storm bolts for the reporting
mechanism. The proposed metrics API would not depend on any of Storm's
messaging capabilities and instead use the [metrics library's built-in reporter
mechanism |
http://metrics.dropwizard.io/3.1.0/manual/core/#man-core-reporters]. This would
allow users to use existing {{Reporter}} implementations which are not
Storm-specific, and would simplify the process of collecting metrics. Compared
to Storm's {{IMetricCollector}} interface, implementing a reporter for the
metrics library is much more straightforward (an example can be found [here |
https://github.com/dropwizard/metrics/blob/3.2-development/metrics-core/src/main/java/com/codahale/metrics/ConsoleReporter.java].
The new metrics capability would not use or affect the ZooKeeper-based metrics
used by Storm UI.
h2. Relationship to JStorm Metrics
[TBD]
h2. Target Branches
[TBD]
h2. Performance Implications
[TBD]
h2. Metrics Namespaces
[TBD]
h2. Metrics Collected
*Worker*
|| Namespace || Metric Type || Description ||
*Nimbus*
|| Namespace || Metric Type || Description ||
*Supervisor*
|| Namespace || Metric Type || Description ||
h2. User-Defined Metrics
[TBD]
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)