Github user revans2 commented on the issue:

    https://github.com/apache/storm/pull/2203
  
    @ptgoetz for me this is all about the metadata for the metric.
    
    The code we have been working on for storing metrics in a DB that we can 
then use for scheduling is getting close to being done.  We have a data 
structure like the following. (NOTE still a work in progress)
    
    ```
    struct WorkerMetric {
      1: required string metricName;
      2: required i64 timestamp;
      3: required double metricValue;
      4: required string componentId;
      5: required string executorId;
      6: required string streamId;
    }
    
    struct WorkerMetricList {
      1: list<WorkerMetric> metrics;
    }
    
    struct WorkerMetrics {
      1: required string topologyId;
      2: required i32 port;
      3: required string hostname;
      4: required WorkerMetricList metricList;
    }
    ```
    
    Having metadata separated out for topology, host, port, etc allow us to 
easily ask questions like how many messages were emitted by all instances of 
bolt "B" on streamId "default" for the topology "T".  I assume people using 
ganglia or nagios want to do the same thing without having to setup a separate 
metric for each bolt/spout/hostname/port combination possible.
    
    Almost every metrics system I have seen supports these kinds of metadata or 
tags, but almost none of the metrics APIs I have seen support this (dropwizard 
included).  As such we need a way to parse these values out of the metric name 
in a reliable way.  Making the name more configurable does not help me in this. 
 In fact it makes it much more difficult.
    
    The current proposed format is:
    ```
    String.format("storm.worker.%s.%s.%s.%s-%s", stormId, hostName, 
componentId, workerPort, name);
    ```
    
    It has most of the information we need.  It is missing the stream id and 
executor id, which for some internal metrics we really need if we are to 
replace the metrics on the UI and reduce the load on zookeeper.
    
    But the issue for me is how do I get the metadata back out in a reliable 
way?  topologyid is simple to parse out because there is no '.' character 
allowed.  hostName is going to have '.' in it, so I don't know what to do 
there.  ComponentId and streamId I think are totally open so again I am in 
trouble no matter what I do, and then the name itself we could do some 
restrictions on to make it simpler.
    
    All I want is some API that can take a metric name and reliably return to 
me the fields that we put into creating the name.


---

Reply via email to