[
https://issues.apache.org/jira/browse/STORM-147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Rick Kellogg updated STORM-147:
-------------------------------
Component/s: storm-core
> UI should use metrics framework
> -------------------------------
>
> Key: STORM-147
> URL: https://issues.apache.org/jira/browse/STORM-147
> Project: Apache Storm
> Issue Type: Improvement
> Components: storm-core
> Reporter: James Xu
> Priority: Minor
>
> https://github.com/nathanmarz/storm/issues/612
> If I understand correctly, the stats framework is deprecated in favor of the
> metrics framework. However, the UI currently relies on the older stats
> framework, and so there are duplicated calls to the stats code all through
> the critical loops, and several interesting numbers gathered only by the
> metrics framework (heap info, etc) absent from the UI.
> A CompileAndZookeepMetrics consumer could listen on the metrics stream,
> assemble data objects that look the same as what the stats framework
> produces, and serialize them into zookeeper. That lets us remove the stats
> tracking calls from the executor and makes it easier to add new metrics to
> the UI, yet doesn't require changes to the underlying UI code. Also, anyone
> else could use zookeeper or thrift to retrieve that synthesized view of the
> cluster metrics.
> My thought is to have one metrics compiler per executor and one per worker.
> Each compiler would maintain a composite object and update it as new metrics
> roll in. As a new value for say the emitted count is received, it updates
> that field in-place, leaving all other last-known values. The compiler would
> clear out its data object on cleanup().
> In the current implementation, the workerbeat has information about the
> worker and all stats rolled up into a single object. We can have the response
> of the current get*Info() thrift calls stay the same, but there would be an
> increase in number of zookeeper calls to build it.
> If this is a welcome feature, I believe @arrawatia is excited to implement it.
> Data objects stored in ZK: one per worker and one per executor? one per
> worker? or one per compiled metric?
> What tempo should the compiler push its compiled view to Zookeeper: on each
> metrics update, or on a heartbeat?
> (This may be a relative of #527)
> ----------
> nathanmarz: Yes, I would like to see this work done. I think the best would
> be:
> One Zookeeper metrics consumer per worker
> All metrics stats get routed to local Zookeeper metrics consumer (should make
> an explicit localGrouping for this that errors if that executor is not there)
> That metrics consumer updates a single node in ZK representing stats for that
> worker, the same way it works now.
> It should update ZK after it receives N updates, where N is the number of
> executors in that worker. That will keep the tempo at approximately the same
> rate as metrics are emitted.
> ----------
> mrflip: possible:
> make a MetricsZkSummarizer that populates a thrift-generated object with
> metrics and serializes them back to zookeeper
> make a subclass SystemZkSummarizer for the specific purpose here
> it sends updates on a tempo of one report per producer
> make the UI work with new object
> beautiful:
> Add fields for other interesting numbers to the worker and executor
> summaries, such as GC and disruptor queues
> display those interesting numbers on UI
> fast:
> make a localGrouping, just like the localOrShuffleGrouping, but which errors
> rather than doing a shuffle
> In--- system-topology! (common.clj), add an add-system-zk-summarizer! method
> to attach the SystemZkSummarizer
> currently, the metrics consumers are attached always with the :shuffle
> grouping (metrics-consumer-bolt-specs). Modify this to get the grouping from
> the MetricsConsumer instead.
> btw -- would the default grouping of a MetricsConsumer be better of as
> :local-or-shuffle, not :shuffle? There doesn't seem to be a reason to deport
> metrics if a consumer bolt is handy.
> ----------
> nathanmarz: Since metrics are written per worker, Storm should actually
> guarantee that there's one ZK metrics executor per worker. And the reason it
> should be :local instead of :local-or-shuffle is because of that guarantee –
> if that executor isn't local then there's a serious problem and there should
> be an error. The ZK metrics executor should be spawned like the SystemBolt in
> order to get the one per worker guarantee, and ensure that if the number of
> workers changes the number of ZKMetrics executors change appropriately.
> ----------
> mrflip: Understood -- my question at the end regarded other MetricsConsumers,
> not this special one: right now JoeBobMetricConsumer gets shuffle grouping,
> but I was wondering if it should get local-or-shuffle instead.
> The SystemZkSummarizer must be local-or-die, and created specially at the
> same lifecycle as the system bolt.
> ----------
> nathanmarz: Ah, well we should make the type of grouping configurable.
> fieldsGrouping on executor id is probably the most logical default.
> ----------
> mrflip: (Addresses #27 )
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)