Abhishek, The information about what your company is facing is in fact what I really wanted to hear from community. Thanks for providing these in detail.
For me I didn't face something directly, but heard that attaching metrics consumer incurred huge degradation of performance of whole topology. There're two things I'm suspecting: 1. metrics consumer can't keep up 2. mass amount of metrics tuples (due to having lots of tasks) which some of them need to be serialized/deserialized and transferred. I'm addressing 1. from async metrics consumer bolt and metrics filter on consumer, but as I said IMetric makes it really hard to resolve 2. I strongly agree that metric tick completely doesn't make sense so I tried to fix it, but IMetric doesn't guarantee thread-safeness so metrics couldn't be collected from other threads (also means other places), and it even also prevents aggregation on worker level. There're some ideas in mind to help reducing the problem with keeping backward compatibility, but I also agree that we should take a new approach if we want to clear out all these problems. Seems like we need to initiate thread for discussion or vote. Thanks, Jungtaek Lim (HeartSaVioR) 2016년 5월 19일 (목) 오전 2:55, Abhishek Agarwal <[email protected]>님이 작성: > I remember from a previous discussion that codahale metrics are shaded > inside storm-core and that breaks compatibility with any existing > plugins/reporters. Will it not be a problem here? And btw does it need to > be shaded? > > @Jungteek - Exactly what are the core issues you have run into w.r.t > metrics? At my company, we make heavy use of metrics. And there were two > major issues, we faced > > - explosion of metrics as the number of tasks increase - This put up a > lot of unnecessary load on the graphite servers even though we were only > interested in machine level aggregated metric. Aggregation is difficult > to > solve while keeping backward compatibility intact. > - metric tick in the same queue as message queue - If bolt is slow or > blocked, metrics for that bolt will not be emitted since metric-tick > won't > be consumed by bolt. It can cause a lot of confusion as . [Refer > STORM-972] > - Only averages are emitted for latency in many places while histograms > are more useful. > > I know you are trying to solve many problems with metric collection but > solving these problems independently of each other might not be the best > approach. I would vote for implementing a backward incompatible solution if > it solves all these problems in clean way. > > On Wed, May 18, 2016 at 9:55 PM, P. Taylor Goetz <[email protected]> > wrote: > > > +1 for standardizing on drop wizard/Coda Hale’s metrics library. It’s a > > solid library that’s widely used and understood. > > > > -Taylor > > > > > > > On May 18, 2016, at 10:22 AM, Bobby Evans <[email protected] > > > > wrote: > > > > > > There are a lot of things that I dislike about IMetric. It provides > too > > much flexibility and at the same time not enough information/conventions > to > > be able to interpret the numbers it returns correctly. We recently had a > > case where someone was trying to compute an average using a ReducedMetric > > and a MeanReducer (which by the way should be deprecated because it is > > fundamentally flawed). This hands the metric collector an average. How > is > > it supposed to combine one average with another when doing a roll up, > > either across components or across time ranges? It just does not work > > mathematically unless you know that all of the averages had the exact > same > > number of operations in them, which we cannot know. > > > This is why dropwizard and other metrics systems have a specific set up > > supported metrics, not object, that they know mathematically work out. A > > gauge is different from a counter, which is different from a ratio, or a > > meter, or a timer, or a histogram. Please lets not reinvent the wheel > > here, we already did it wrong once, lets not do it wrong again. We are > > using dropwizard in other places in the code internally, I would prefer > > that we standardize on it, or a thin wrapper around it based on the same > > concepts. Or if there is a different API that someone here would prefer > > that we use that is fine with me too. But lets not write it ourselves, > > lets take from the experts who have spent a long time building something > > that works. > > > - Bobby > > > > > > On Tuesday, May 17, 2016 10:10 PM, Jungtaek Lim <[email protected]> > > wrote: > > > > > > > > > Hi devs, > > > > > > Since IMetric#getValueAndReset doesn't restrict return type, it gives > us > > > flexibility but metrics consumer should parse the value without context > > > (having just some assumptions). > > > > > > I've look into some open source metrics consumers, and many of them > > support > > > Number, Map<String, Number/String>, and one of them supports Nested > Map. > > > For the case of Map its key is appended to metric key and value is > > > converted to 'double'. I think it would be enough, but I'm not sure we > > can > > > rely on all metrics consumers to handle properly, too. > > > > > > I feel if we can recommend proper types of DataPoint value for storing > > > metrics to time-series DB via metrics consumer it would be great. It > can > > be > > > used to protocol between IMetric users and metrics consumer developers. > > > > > > What do you think? > > > > > > Thanks, > > > Jungtaek Lim (HeartSaVioR) > > > > > > ps. I'm not heavy user of time-series DBs (I researched some, but they > > > don't document type/size of value clearly) so if someone could give the > > > information of supporting type/size of value of time-series DBs it > should > > > be great for me. Or we can just assume number as 'double' as above and > go > > > forward. > > > > > > > > > > > > > -- > Regards, > Abhishek Agarwal >
