James Xu created STORM-147:
------------------------------
Summary: UI should use metrics framework
Key: STORM-147
URL: https://issues.apache.org/jira/browse/STORM-147
Project: Apache Storm (Incubating)
Issue Type: Improvement
Reporter: James Xu
Priority: Minor
https://github.com/nathanmarz/storm/issues/612
If I understand correctly, the stats framework is deprecated in favor of the
metrics framework. However, the UI currently relies on the older stats
framework, and so there are duplicated calls to the stats code all through the
critical loops, and several interesting numbers gathered only by the metrics
framework (heap info, etc) absent from the UI.
A CompileAndZookeepMetrics consumer could listen on the metrics stream,
assemble data objects that look the same as what the stats framework produces,
and serialize them into zookeeper. That lets us remove the stats tracking calls
from the executor and makes it easier to add new metrics to the UI, yet doesn't
require changes to the underlying UI code. Also, anyone else could use
zookeeper or thrift to retrieve that synthesized view of the cluster metrics.
My thought is to have one metrics compiler per executor and one per worker.
Each compiler would maintain a composite object and update it as new metrics
roll in. As a new value for say the emitted count is received, it updates that
field in-place, leaving all other last-known values. The compiler would clear
out its data object on cleanup().
In the current implementation, the workerbeat has information about the worker
and all stats rolled up into a single object. We can have the response of the
current get*Info() thrift calls stay the same, but there would be an increase
in number of zookeeper calls to build it.
If this is a welcome feature, I believe @arrawatia is excited to implement it.
Data objects stored in ZK: one per worker and one per executor? one per worker?
or one per compiled metric?
What tempo should the compiler push its compiled view to Zookeeper: on each
metrics update, or on a heartbeat?
(This may be a relative of #527)
----------
nathanmarz: Yes, I would like to see this work done. I think the best would be:
One Zookeeper metrics consumer per worker
All metrics stats get routed to local Zookeeper metrics consumer (should make
an explicit localGrouping for this that errors if that executor is not there)
That metrics consumer updates a single node in ZK representing stats for that
worker, the same way it works now.
It should update ZK after it receives N updates, where N is the number of
executors in that worker. That will keep the tempo at approximately the same
rate as metrics are emitted.
----------
mrflip: possible:
make a MetricsZkSummarizer that populates a thrift-generated object with
metrics and serializes them back to zookeeper
make a subclass SystemZkSummarizer for the specific purpose here
it sends updates on a tempo of one report per producer
make the UI work with new object
beautiful:
Add fields for other interesting numbers to the worker and executor summaries,
such as GC and disruptor queues
display those interesting numbers on UI
fast:
make a localGrouping, just like the localOrShuffleGrouping, but which errors
rather than doing a shuffle
In--- system-topology! (common.clj), add an add-system-zk-summarizer! method to
attach the SystemZkSummarizer
currently, the metrics consumers are attached always with the :shuffle grouping
(metrics-consumer-bolt-specs). Modify this to get the grouping from the
MetricsConsumer instead.
btw -- would the default grouping of a MetricsConsumer be better of as
:local-or-shuffle, not :shuffle? There doesn't seem to be a reason to deport
metrics if a consumer bolt is handy.
----------
nathanmarz: Since metrics are written per worker, Storm should actually
guarantee that there's one ZK metrics executor per worker. And the reason it
should be :local instead of :local-or-shuffle is because of that guarantee – if
that executor isn't local then there's a serious problem and there should be an
error. The ZK metrics executor should be spawned like the SystemBolt in order
to get the one per worker guarantee, and ensure that if the number of workers
changes the number of ZKMetrics executors change appropriately.
----------
mrflip: Understood -- my question at the end regarded other MetricsConsumers,
not this special one: right now JoeBobMetricConsumer gets shuffle grouping, but
I was wondering if it should get local-or-shuffle instead.
The SystemZkSummarizer must be local-or-die, and created specially at the same
lifecycle as the system bolt.
----------
nathanmarz: Ah, well we should make the type of grouping configurable.
fieldsGrouping on executor id is probably the most logical default.
----------
mrflip: (Addresses #27 )
--
This message was sent by Atlassian JIRA
(v6.1.4#6159)