Re: Storm Performace Benchmark

P. Taylor Goetz Tue, 04 Mar 2014 13:02:43 -0800

I ran into the same set of issues a little while back when I looked into 
writing a MetricsConsumer implementation that pushed metrics data to ganglia. 
Keeping it generic enough that it would be useful for user-defined metrics was 
enough of a struggle that I punted. And as you alluded to, mapping it to 
ganglia’s model was a challenge.


I ultimately cobbled together a one-off bridge that pushed Storm UI metrics to 
ganglia by polling Nimbus along with JVM metrics pulled from the JVM JMX API.

I’m also a big fan of Coda Hale’s metrics library that Michael Knoll pointed 
out. It’s solid, easy to use, and integrates well with many monitoring/metrics 
systems.

- Taylor

On Feb 24, 2014, at 6:12 PM, Bobby Evans <[email protected]> wrote:

> On that note I have really wanted to add in more metrics to the perf test
> using the the storm metrics subsystem not just the metrics that are
> uploaded to ZK and accessible through nimbus, but I have not found time to
> do that.  Writing a generic metrics aggregator is almost impossible
> because the metrics system can send anything across the wire.  In many
> cases they are numbers, but there are also many different cases where it
> is a map of a string to a number, or even something else more complex is
> possible with user defined metrics.  And even in the cases where it is
> just a number most of the time you can take it as an incremental update to
> a running count (i.e. Number of events processed over the last N seconds),
> but in some cases it may be a hard number (Heap space used by the VM, or
> number of events queued in the disruptor queue).
> 
> You almost have to look at every metric that is printed out and decide if
> you want to process it, and if so how to put it into your
> monitoring/metrics system of choice.  The logging metrics collector is
> simple, but not what most people will want to use.
> 
> Then there are the latency metrics where it gets even more complex because
> to aggregate them you also need the corresponding event counts.  The
> metrics for these that you get from the UI/Nimbus handle this for you, but
> with this system you need to do some of the math yourself to compute a
> latency weighted by the event throughput.
> 
> ‹Bobby
> 
> On 2/24/14, 12:06 PM, "Otávio Carvalho" <[email protected]> wrote:
> 
>> You can also take a look at storm-perf-test (
>> https://github.com/yahoo/storm-perf-test/) source code.
>> I'm currently trying to extract some metrics, in order to develop
>> benchmarks for storm and other stream processors, and I thought it was
>> really useful.
>> 
>> Thanks,
>> 
>> Otávio.
>> 
>> Undergraduate Student at Federal University of Rio Grande do Sul -
>> http://inf.ufrgs.br
>> Scholarship holder at Parallel and Distributed Processing Group -
>> http://gppd.inf.ufrgs.br
>> [email protected] / @otaviocarvalho
>> 
>> 2014-02-24 12:50 GMT-03:00 Milinda Pathirage <[email protected]>:
>> 
>>> Hi Padma,
>>> 
>>> I think answers to your questions are there in the article you
>>> mentioned. Anyway I'll try to explain what needs to be done briefly.
>>> Note that I don't have any experience on statsd or graphite.
>>> 
>>> First on sumerizing the metrics in metrics.log file. If you want to
>>> summerize the metrics mentioned in the article, you will have to write
>>> your own summarizer. It depends on what and how to summerize data you
>>> collected. It looks like fields in metrics.log contains information
>>> such as timestamp of the metrics publish event, storm host name, bolt
>>> identifier, metrics identifier and actual metrics value. You should be
>>> able to understand it by reading [2].
>>> 
>>>  - It looks like people are using statsd to feed graphite[1]. And
>>> author of the article you mentioned also planning to use the same
>>> approach.
>>>  - In this case you need to first write a metrics consumer which
>>> publish metrics to statsd.
>>>  - Then connnect statsd and graphite according to [2].
>>> 
>>> I think its possible to write a metrics consumer which directly feed
>>> graphite. But I am not sure whether which approach is easier.
>>> 
>>> Thanks
>>> Milinda
>>> 
>>> [1]
>>> 
>>> http://matt.aimonetti.net/posts/2013/06/26/practical-guide-to-graphite-mo
>>> nitoring/
>>> [2]
>>> 
>>> https://github.com/nathanmarz/storm/blob/master/storm-core/src/jvm/backty
>>> pe/storm/metric/LoggingMetricsConsumer.java
>>> 
>>> On Mon, Feb 24, 2014 at 8:40 AM, padma priya chitturi
>>> <[email protected]> wrote:
>>>> Hi All,
>>>> 
>>>> I've been using storm metrics to visualize the performance of storm (
>>>> 
>>> http://www.bigdata-cookbook.com/post/72320512609/storm-metrics-how-to).
>>>> 
>>>> I have included the metrics initialization code in ExclamationTopology
>>> code
>>>> and saw the metrics in metrics.log
>>>> 
>>>> How can we summarize the metrics in metrics.log file. ? what do
>>> different
>>>> fields mean ? How can we visualize the metrics using Graphite ?
>>>> 
>>>> Can someone suggest me in this..
>>>> 
>>>> Thanks,
>>>> Padma Ch.
>>> 
>>> 
>>> 
>>> --
>>> Milinda Pathirage
>>> 
>>> PhD Student | Research Assistant
>>> School of Informatics and Computing | Data to Insight Center
>>> Indiana University
>>> 
>>> twitter: milindalakmal
>>> skype: milinda.pathirage
>>> blog: http://milinda.pathirage.org
>>> 
>

signature.asc
Description: Message signed with OpenPGP using GPGMail

Re: Storm Performace Benchmark

Reply via email to