[ 
https://issues.apache.org/jira/browse/STORM-2153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15588873#comment-15588873
 ] 

Robert Joseph Evans commented on STORM-2153:
--------------------------------------------

I agree with all that has been said here.  There really are several smaller 
pieces of this project that each need to be addressed separately.

1) End User API
2) Reporting API
3) Default reporting implementation
4) Query API
5) Default Query Implementation
6) UI Updates

Most of these pieces can be worked on separately and some what independently.

We all seem to agree on 1 and 2 being stock Codahale.  I know others have moved 
away form Codahale in the past, but if we run into issues I would rather try to 
work them out with the Codahale community rather then go it on our own.

For 3 The default reporting implementation I think the simplest approach to 
start out with is to have a default reporter periodically write metrics to the 
local file system, and have the supervisor pick them up and report them through 
thrift to nimbus.  This allows us to not have to worry about security.  The 
supervisor will be able to authenticate with nimbus no problem as it already 
does.

For 5 I think using the JStorm rocksdb implementation as a starting point is 
great, and others seem to agree.

4 and 6 are things that we have not really addressed here.  We probably should 
look at others and see what they are doing here. and possibly copy a striped 
down version of OpenTSDB or Druid as an initial starting point.

My proposal would be to file separate JIRA for each of these pieces.  Many of 
these pieces can provide values without the others fully in place.  Having a 
new metrics/reporter API based on Codahale that is parallel to the 
IMetricsConsumer we have right now would be a good start.  It would fix a lot 
of the issues with IMetricsConsumer but we wouldn't have to tie it into an 
internal reporting system yet.  We could even implement it in 1.x and deprecate 
the older API there as well.

Having a time series database that all it does is store metrics that are 
currently reported through ZK would be a great step too.  It would not be 
prefect, but at least we could have a history for the metrics and they would 
not reset every time a worker crashes.

Once those two are in place we can glue the different pieces together.

It feels like 3 phases each independent and each fairly manageable.

> New Metrics Reporting API
> -------------------------
>
>                 Key: STORM-2153
>                 URL: https://issues.apache.org/jira/browse/STORM-2153
>             Project: Apache Storm
>          Issue Type: Improvement
>            Reporter: P. Taylor Goetz
>
> This is a proposal to provide a new metrics reporting API based on [Coda 
> Hale's metrics library | http://metrics.dropwizard.io/3.1.0/] (AKA 
> Dropwizard/Yammer metrics).
> h2. Background
> In a [discussion on the dev@ mailing list | 
> http://mail-archives.apache.org/mod_mbox/storm-dev/201610.mbox/%3ccagx0urh85nfh0pbph11pmc1oof6htycjcxsxgwp2nnofukq...@mail.gmail.com%3e]
>   a number of community and PMC members recommended replacing Storm’s metrics 
> system with a new API as opposed to enhancing the existing metrics system. 
> Some of the objections to the existing metrics API include:
> # Metrics are reported as an untyped Java object, making it very difficult to 
> reason about how to report it (e.g. is it a gauge, a counter, etc.?)
> # It is difficult to determine if metrics coming into the consumer are 
> pre-aggregated or not.
> # Storm’s metrics collection occurs through a specialized bolt, which in 
> addition to potentially affecting system performance, complicates certain 
> types of aggregation when the parallelism of that bolt is greater than one.
> In the discussion on the developer mailing list, there is growing consensus 
> for replacing Storm’s metrics API with a new API based on Coda Hale’s metrics 
> library. This approach has the following benefits:
> # Coda Hale’s metrics library is very stable, performant, well thought out, 
> and widely adopted among open source projects (e.g. Kafka).
> # The metrics library provides many existing metric types: Meters, Gauges, 
> Counters, Histograms, and more.
> # The library has a pluggable “reporter” API for publishing metrics to 
> various systems, with existing implementations for: JMX, console, CSV, SLF4J, 
> Graphite, Ganglia.
> # Reporters are straightforward to implement, and can be reused by any 
> project that uses the metrics library (i.e. would have broader application 
> outside of Storm)
> As noted earlier, the metrics library supports pluggable reporters for 
> sending metrics data to other systems, and implementing a reporter is fairly 
> straightforward (an example reporter implementation can be found here). For 
> example if someone develops a reporter based on Coda Hale’s metrics, it could 
> not only be used for pushing Storm metrics, but also for any system that used 
> the metrics library, such as Kafka.
> h2. Scope of Effort
> The effort to implement a new metrics API for Storm can be broken down into 
> the following development areas:
> # Implement API for Storms internal worker metrics: latencies, queue sizes, 
> capacity, etc.
> # Implement API for user defined, topology-specific metrics (exposed via the 
> {{org.apache.storm.task.TopologyContext}} class)
> # Implement API for storm daemons: nimbus, supervisor, etc.
> h2. Relationship to Existing Metrics
> This would be a new API that would not affect the existing metrics API. Upon 
> completion, the old metrics API would presumably be deprecated, but kept in 
> place for backward compatibility.
> Internally the current metrics API uses Storm bolts for the reporting 
> mechanism. The proposed metrics API would depend on any of Storm's messaging 
> capabilities and instead use the [metrics library's built-in reporter 
> mechanism | 
> http://metrics.dropwizard.io/3.1.0/manual/core/#man-core-reporters]. This 
> would allow users to use existing {{Reporter}} implementations which are not 
> Storm-specific, and would simplify the process of collecting metrics. 
> Compared to Storm's {{IMetricCollector}} interface, implementing a reporter 
> for the metrics library is much more straightforward (an example can be found 
> [here | 
> https://github.com/dropwizard/metrics/blob/3.2-development/metrics-core/src/main/java/com/codahale/metrics/ConsoleReporter.java].
> The new metrics capability would not use or affect the ZooKeeper-based 
> metrics used by Storm UI.
> h2. Relationship to JStorm Metrics
> [TBD]
> h2. Target Branches
> [TBD]
> h2. Performance Implications
> [TBD]
> h2. Metrics Namespaces
> [TBD]
> h2. Metrics Collected
> *Worker*
> || Namespace || Metric Type || Description ||
> *Nimbus*
> || Namespace || Metric Type || Description ||
> *Supervisor*
> || Namespace || Metric Type || Description ||
> h2. User-Defined Metrics
> [TBD]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to