[ 
https://issues.apache.org/jira/browse/STORM-147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rick Kellogg updated STORM-147:
-------------------------------
    Component/s: storm-core

> UI should use metrics framework
> -------------------------------
>
>                 Key: STORM-147
>                 URL: https://issues.apache.org/jira/browse/STORM-147
>             Project: Apache Storm
>          Issue Type: Improvement
>          Components: storm-core
>            Reporter: James Xu
>            Priority: Minor
>
> https://github.com/nathanmarz/storm/issues/612
> If I understand correctly, the stats framework is deprecated in favor of the 
> metrics framework. However, the UI currently relies on the older stats 
> framework, and so there are duplicated calls to the stats code all through 
> the critical loops, and several interesting numbers gathered only by the 
> metrics framework (heap info, etc) absent from the UI.
> A CompileAndZookeepMetrics consumer could listen on the metrics stream, 
> assemble data objects that look the same as what the stats framework 
> produces, and serialize them into zookeeper. That lets us remove the stats 
> tracking calls from the executor and makes it easier to add new metrics to 
> the UI, yet doesn't require changes to the underlying UI code. Also, anyone 
> else could use zookeeper or thrift to retrieve that synthesized view of the 
> cluster metrics.
> My thought is to have one metrics compiler per executor and one per worker. 
> Each compiler would maintain a composite object and update it as new metrics 
> roll in. As a new value for say the emitted count is received, it updates 
> that field in-place, leaving all other last-known values. The compiler would 
> clear out its data object on cleanup().
> In the current implementation, the workerbeat has information about the 
> worker and all stats rolled up into a single object. We can have the response 
> of the current get*Info() thrift calls stay the same, but there would be an 
> increase in number of zookeeper calls to build it.
> If this is a welcome feature, I believe @arrawatia is excited to implement it.
> Data objects stored in ZK: one per worker and one per executor? one per 
> worker? or one per compiled metric?
> What tempo should the compiler push its compiled view to Zookeeper: on each 
> metrics update, or on a heartbeat?
> (This may be a relative of #527)
> ----------
> nathanmarz: Yes, I would like to see this work done. I think the best would 
> be:
> One Zookeeper metrics consumer per worker
> All metrics stats get routed to local Zookeeper metrics consumer (should make 
> an explicit localGrouping for this that errors if that executor is not there)
> That metrics consumer updates a single node in ZK representing stats for that 
> worker, the same way it works now.
> It should update ZK after it receives N updates, where N is the number of 
> executors in that worker. That will keep the tempo at approximately the same 
> rate as metrics are emitted.
> ----------
> mrflip: possible:
> make a MetricsZkSummarizer that populates a thrift-generated object with 
> metrics and serializes them back to zookeeper
> make a subclass SystemZkSummarizer for the specific purpose here
> it sends updates on a tempo of one report per producer
> make the UI work with new object
> beautiful:
> Add fields for other interesting numbers to the worker and executor 
> summaries, such as GC and disruptor queues
> display those interesting numbers on UI
> fast:
> make a localGrouping, just like the localOrShuffleGrouping, but which errors 
> rather than doing a shuffle
> In--- system-topology! (common.clj), add an add-system-zk-summarizer! method 
> to attach the SystemZkSummarizer
> currently, the metrics consumers are attached always with the :shuffle 
> grouping (metrics-consumer-bolt-specs). Modify this to get the grouping from 
> the MetricsConsumer instead.
> btw -- would the default grouping of a MetricsConsumer be better of as 
> :local-or-shuffle, not :shuffle? There doesn't seem to be a reason to deport 
> metrics if a consumer bolt is handy.
> ----------
> nathanmarz: Since metrics are written per worker, Storm should actually 
> guarantee that there's one ZK metrics executor per worker. And the reason it 
> should be :local instead of :local-or-shuffle is because of that guarantee – 
> if that executor isn't local then there's a serious problem and there should 
> be an error. The ZK metrics executor should be spawned like the SystemBolt in 
> order to get the one per worker guarantee, and ensure that if the number of 
> workers changes the number of ZKMetrics executors change appropriately.
> ----------
> mrflip: Understood -- my question at the end regarded other MetricsConsumers, 
> not this special one: right now JoeBobMetricConsumer gets shuffle grouping, 
> but I was wondering if it should get local-or-shuffle instead. 
> The SystemZkSummarizer must be local-or-die, and created specially at the 
> same lifecycle as the system bolt.
> ----------
> nathanmarz: Ah, well we should make the type of grouping configurable. 
> fieldsGrouping on executor id is probably the most logical default.
> ----------
> mrflip: (Addresses #27 )



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to