James Xu created STORM-147:
------------------------------

             Summary: UI should use metrics framework
                 Key: STORM-147
                 URL: https://issues.apache.org/jira/browse/STORM-147
             Project: Apache Storm (Incubating)
          Issue Type: Improvement
            Reporter: James Xu
            Priority: Minor


https://github.com/nathanmarz/storm/issues/612

If I understand correctly, the stats framework is deprecated in favor of the 
metrics framework. However, the UI currently relies on the older stats 
framework, and so there are duplicated calls to the stats code all through the 
critical loops, and several interesting numbers gathered only by the metrics 
framework (heap info, etc) absent from the UI.

A CompileAndZookeepMetrics consumer could listen on the metrics stream, 
assemble data objects that look the same as what the stats framework produces, 
and serialize them into zookeeper. That lets us remove the stats tracking calls 
from the executor and makes it easier to add new metrics to the UI, yet doesn't 
require changes to the underlying UI code. Also, anyone else could use 
zookeeper or thrift to retrieve that synthesized view of the cluster metrics.

My thought is to have one metrics compiler per executor and one per worker. 
Each compiler would maintain a composite object and update it as new metrics 
roll in. As a new value for say the emitted count is received, it updates that 
field in-place, leaving all other last-known values. The compiler would clear 
out its data object on cleanup().

In the current implementation, the workerbeat has information about the worker 
and all stats rolled up into a single object. We can have the response of the 
current get*Info() thrift calls stay the same, but there would be an increase 
in number of zookeeper calls to build it.

If this is a welcome feature, I believe @arrawatia is excited to implement it.

Data objects stored in ZK: one per worker and one per executor? one per worker? 
or one per compiled metric?
What tempo should the compiler push its compiled view to Zookeeper: on each 
metrics update, or on a heartbeat?
(This may be a relative of #527)

----------
nathanmarz: Yes, I would like to see this work done. I think the best would be:

One Zookeeper metrics consumer per worker
All metrics stats get routed to local Zookeeper metrics consumer (should make 
an explicit localGrouping for this that errors if that executor is not there)
That metrics consumer updates a single node in ZK representing stats for that 
worker, the same way it works now.
It should update ZK after it receives N updates, where N is the number of 
executors in that worker. That will keep the tempo at approximately the same 
rate as metrics are emitted.

----------
mrflip: possible:

make a MetricsZkSummarizer that populates a thrift-generated object with 
metrics and serializes them back to zookeeper
make a subclass SystemZkSummarizer for the specific purpose here
it sends updates on a tempo of one report per producer
make the UI work with new object
beautiful:

Add fields for other interesting numbers to the worker and executor summaries, 
such as GC and disruptor queues
display those interesting numbers on UI
fast:

make a localGrouping, just like the localOrShuffleGrouping, but which errors 
rather than doing a shuffle
In--- system-topology! (common.clj), add an add-system-zk-summarizer! method to 
attach the SystemZkSummarizer
currently, the metrics consumers are attached always with the :shuffle grouping 
(metrics-consumer-bolt-specs). Modify this to get the grouping from the 
MetricsConsumer instead.
btw -- would the default grouping of a MetricsConsumer be better of as 
:local-or-shuffle, not :shuffle? There doesn't seem to be a reason to deport 
metrics if a consumer bolt is handy.

----------
nathanmarz: Since metrics are written per worker, Storm should actually 
guarantee that there's one ZK metrics executor per worker. And the reason it 
should be :local instead of :local-or-shuffle is because of that guarantee – if 
that executor isn't local then there's a serious problem and there should be an 
error. The ZK metrics executor should be spawned like the SystemBolt in order 
to get the one per worker guarantee, and ensure that if the number of workers 
changes the number of ZKMetrics executors change appropriately.

----------
mrflip: Understood -- my question at the end regarded other MetricsConsumers, 
not this special one: right now JoeBobMetricConsumer gets shuffle grouping, but 
I was wondering if it should get local-or-shuffle instead. 
The SystemZkSummarizer must be local-or-die, and created specially at the same 
lifecycle as the system bolt.

----------
nathanmarz: Ah, well we should make the type of grouping configurable. 
fieldsGrouping on executor id is probably the most logical default.

----------
mrflip: (Addresses #27 )




--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

Reply via email to