I am working on adding queue depth metrics. https://github.com/apache/storm/pull/1406
On Mon, May 23, 2016 at 7:21 PM, Cody Innowhere <e.neve...@gmail.com> wrote: > Taking queue depth is a good idea for measuring load of components though > there're some scenarios that users want to know the latency of a specific > method/processing flow. > BTW., we've added support to turn on/off metrics dynamically in > JStorm(actually we had this feature in JStorm 2.1.1 but there's a bug in > implementation), like metrics filtering, I think this is a pretty practical > requirement. > > I agree with you that it's not a simple 1 to 1 translation, actually new > metrics requirements look more like refactoring rather than strong > improvements. > > On Mon, May 23, 2016 at 9:36 PM, Bobby Evans <ev...@yahoo-inc.com.invalid> > wrote: > > > From JStorm we got the ability to show most of the metrics on the UI, but > > I think we missed the ability to turn on and off metrics dynamically. > This > > is important because there will be metrics that are expensive to gather, > > but very important when debugging specific problems. > > I also think we should spend some time thinking about why we are > > gathering/computing specific metrics, what is the question we are trying > to > > answer and is there a better way to answer it. > > For example we collect a lot of latency statistics for a bolt to answer a > > few different questions, but for the most part the latency stats we have > > are bad at answering either of those questions. > > > > 1) How over loaded is a particular bolt? aka capacity. > > 2) How does this bolt contribute to the overall processing time in a > > topology? > > For the first one we subsample because it is expensive to compute the > > latency every time. But in practice I have seen many bolts with a > capacity > > well over 1, I have even seen a 35. How can a bolt be running 3500% of > the > > time? If we use the input queue depth instead that could give us a much > > better picture, and be a lot less expensive. > > > > For the second one we have talked about adding in more latency > > measurements, (serialization/deserialization time, etc) but I think we > will > > end up playing whack a mole unless we take a wholistic approach where we > > don't just measure small part of the system, but we measure latency from > > point A to point B and from point B to point C, so there are no gaps? > > > > I just don't think we want to have a simple 1 to 1 translation. > > - Bobby > > > > On Monday, May 23, 2016 5:05 AM, Jungtaek Lim <kabh...@gmail.com> > > wrote: > > > > > > Thanks for following up, Abhishek. I agree with you about the consensus. > > I think you already listed most of my / community requirements. I'm > adding > > up something I have in mind, > > 7. Aggregation at stream level (STORM-1719), and machine level8. way to > > subscribe cluster metrics (STORM-1723)9. counter stats as non-sampled if > it > > doesn't hurt performance10. more metrics like > serialization/deserialization > > latency, queue status > > I'd also like to see Spout offset information if implementation of Spout > > can provide, but it's just my 2 cent. > > Thanks, > > Jungtaek Lim (HeartSaVioR) > > 2016년 5월 23일 (월) 오후 6:33, Abhishek Agarwal <abhishc...@gmail.com>님이 작성: > > > > So I am assuming that there is a general consensus on adding new api for > > metrics and gradually phasing out the old one. If yes, may be we can work > > toward finer details of how to maintain two apis as well as the design of > > new API. > > > > Jungtaek, it would be better to summarize the requirements and let others > > add on what they feel missing. Some asks I have seen - > > > > 1. Aggregation at component level (Average, Sum etc) - > > 2. Blacklist/whitelist > > 3. Allow only numbers for values > > 4. Efficient routing of built-in metrics to UI (current they get tagged > > along with executor heartbeat which puts pressure on zookeeper) > > 5. Worker/JVM level metrics which are not owned by a particular component > > 6. Percentiles for latency metrics such as p99, p95 etc > > > > Not all of them may be considered. Please add anything I might have > missed. > > > > > > > > On Fri, May 20, 2016 at 5:57 AM, Jungtaek Lim <kabh...@gmail.com> wrote: > > > > > Personally I'm also in favor of maintaining old API (but deprecated) > and > > > adding new API. > > > It's ideal way, and that's what many projects are trying to do, and so > on > > > the other project I'm also maintaining. > > > > > > And I also prefer to gone away current metrics feature in next major > > > release. In general, before each major release we should discuss which > > > classes/features to drop. I think we forgot this when we release 1.0.0, > > and > > > deprecated things are all alive. > > > > > > I'm also in favor of dropping the support for custom frequency for each > > > metric. I guess we don't need to worry about mapping since internal > > metrics > > > on Storm will be moved to new metrics feature. While some behavioral > > > changes > > > could be occurred (for example, IMetricsConsumer could be changed to no > > > longer receive built-in metrics on Storm), it will not break backward > > > compatibility from the API side anyway. > > > > > > Thanks, > > > Jungtaek Lim (HeartSaVioR) > > > > > > 2016년 5월 20일 (금) 오전 12:57, Abhishek Agarwal <abhishc...@gmail.com>님이 > 작성: > > > > > > > Sounds good. Having two separate metric reporters may be confusing > but > > it > > > > is better than breaking the client code. > > > > > > > > Codahale library allows user to specify frequency per reporter > > instance. > > > > Storm on the other hand allows different reporting frequency for each > > > > metric. How will that mapping work? I am ok to drop the support for > > > custom > > > > frequency for each metric. Internal metrics in storm anyway use same > > > > frequency of reporting. > > > > > > > > On Thu, May 19, 2016 at 9:04 PM, Bobby Evans > > <ev...@yahoo-inc.com.invalid > > > > > > > > wrote: > > > > > > > > > I personally would like to see that change happen differently for > the > > > two > > > > > branches. > > > > > On 1.x we add in a new API for both reporting metrics and > collecting > > in > > > > > parallel to the old API. We leave IMetric and IMetricsConsumer in > > > place > > > > > but deprecated. As we move internal metrics over from the old > > > interface > > > > to > > > > > the new one, we either keep versions of the old ones in place or we > > > > provide > > > > > a translation shim going from the new to the old. > > > > > > > > > > In 2.x either the old way is gone completely or it is off by > default. > > > I > > > > > prefer gone completely. > > > > > > > > > > If we go off of dropwizard/codahale metrics or a layer around them > > like > > > > > was discussed previously it seems fairly straight forward to take > > some > > > of > > > > > our current metrics that all trigger at the same interval and > setup a > > > > > reporter that can translate them into the format that was reported > > > > > previously. > > > > > In 1.x to get a full picture of what is happening if your topology > > you > > > > may > > > > > need two separate reporters. One for the new metrics and one for > the > > > > old, > > > > > but it should only be for a short period of time. - Bobby > > > > > > > > > > On Thursday, May 19, 2016 1:00 AM, Cody Innowhere < > > > > e.neve...@gmail.com> > > > > > wrote: > > > > > > > > > > > > > > > If we want to refactor the metrics system, I think we may have to > > > incur > > > > > breaking changes. We can make it backward compatible but this means > > we > > > > may > > > > > build an adapt layer on top of metrics, or a lot of "if...else..." > > > which > > > > > might be ugly, either way, it might be a pain to maintain the code. > > > > > So I prefer to making breaking changes if we want to build a new > > > metrics > > > > > system, and I'm OK to move JStorm metrics migration phase forward > to > > > 1.x, > > > > > and I'm happy to share our design & experiences. > > > > > > > > > > On Thu, May 19, 2016 at 11:12 AM, Jungtaek Lim <kabh...@gmail.com> > > > > wrote: > > > > > > > > > > > Hi devs, > > > > > > > > > > > > I'd like to see our opinions on breaking changes on metrics for > > 1.x. > > > > > > > > > > > > Some backgrounds here: > > > > > > > > > > > > - As you may have seen, I'm trying to address some places to > > improve > > > > > > metrics without breaking backward compatibility, but it's limited > > due > > > > to > > > > > > interface IMetric which is opened to public. > > > > > > - We're working on Storm 2.0.0, and evaluation / adoption for > > metrics > > > > > > feature of JStorm is planned to phase 2 but we all don't know > > > estimated > > > > > > release date, and I feel it's not near future. > > > > > > - We've just released Storm 1.0.x, so I expected the lifetime of > > > Storm > > > > > 1.0 > > > > > > (even 0.9) months or even years. > > > > > > > > > > > > If someone wants to know what exactly things current metrics > > feature > > > > > > matter, please let me know so that I will summarize. > > > > > > > > > > > > I have other ideas on mind to relieve some problems with current > > > > metrics, > > > > > > so I'm also OK to postpone renewal of metrics to 2.0.0 with > > applying > > > > > those > > > > > > workaround ideas. But if we're having willingness to address > > metrics > > > on > > > > > > 1.x, IMO we can consider breaking backward compatibility from 1.x > > for > > > > > once. > > > > > > > > > > > > What do you think? > > > > > > > > > > > > Thanks, > > > > > > Jungtaek Lim (HeartSaVioR) > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > Regards, > > > > Abhishek Agarwal > > > > > > > > > > > > > > > -- > > Regards, > > Abhishek Agarwal > > > > > > > > > > > -- Regards, Abhishek Agarwal