Re: [DISCUSSION] opinions on breaking changes on metrics for 1.x

Bobby Evans Mon, 23 May 2016 06:36:45 -0700

>From JStorm we got the ability to show most of the metrics on the UI, but I 
>think we missed the ability to turn on and off metrics dynamically.  This is 
>important because there will be metrics that are expensive to gather, but very 
>important when debugging specific problems.
I also think we should spend some time thinking about why we are 
gathering/computing specific metrics, what is the question we are trying to 
answer and is there a better way to answer it.
For example we collect a lot of latency statistics for a bolt to answer a few 
different questions, but for the most part the latency stats we have are bad at 
answering either of those questions.

1) How over loaded is a particular bolt? aka capacity.
2) How does this bolt contribute to the overall processing time in a topology?
For the first one we subsample because it is expensive to compute the latency 
every time.  But in practice I have seen many bolts with a capacity well over 
1, I have even seen a 35.  How can a bolt be running 3500% of the time? If we 
use the input queue depth instead that could give us a much better picture, and 
be a lot less expensive.

For the second one we have talked about adding in more latency measurements, 
(serialization/deserialization time, etc) but I think we will end up playing 
whack a mole unless we take a wholistic approach where we don't just measure 
small part of the system, but we measure latency from point A to point B and 
from point B to point C, so there are no gaps?

I just don't think we want to have a simple 1 to 1 translation.
 - Bobby 

    On Monday, May 23, 2016 5:05 AM, Jungtaek Lim <[email protected]> wrote:

 Thanks for following up, Abhishek. I agree with you about the consensus.
I think you already listed most of my / community requirements. I'm adding up 
something I have in mind,
7. Aggregation at stream level (STORM-1719), and machine level8. way to 
subscribe cluster metrics (STORM-1723)9. counter stats as non-sampled if it 
doesn't hurt performance10. more metrics like serialization/deserialization 
latency, queue status
I'd also like to see Spout offset information if implementation of Spout can 
provide, but it's just my 2 cent.
Thanks,
Jungtaek Lim (HeartSaVioR)
2016년 5월 23일 (월) 오후 6:33, Abhishek Agarwal <[email protected]>님이 작성:

So I am assuming that there is a general consensus on adding new api for
metrics and gradually phasing out the old one. If yes, may be we can work
toward finer details of how to maintain two apis as well as the design of
new API.

Jungtaek, it would be better to summarize the requirements and let others
add on what they feel missing. Some asks I have seen -

1. Aggregation at component level (Average, Sum etc) -
2. Blacklist/whitelist
3. Allow only numbers for values
4. Efficient routing of built-in metrics to UI (current they get tagged
along with executor heartbeat which puts pressure on zookeeper)
5. Worker/JVM level metrics which are not owned by a particular component
6. Percentiles for latency metrics such as p99, p95 etc

Not all of them may be considered. Please add anything I might have missed.

On Fri, May 20, 2016 at 5:57 AM, Jungtaek Lim <[email protected]> wrote:

> Personally I'm also in favor of maintaining old API (but deprecated) and
> adding new API.
> It's ideal way, and that's what many projects are trying to do, and so on
> the other project I'm also maintaining.
>
> And I also prefer to gone away current metrics feature in next major
> release. In general, before each major release we should discuss which
> classes/features to drop. I think we forgot this when we release 1.0.0, and
> deprecated things are all alive.
>
> I'm also in favor of dropping the support for custom frequency for each
> metric. I guess we don't need to worry about mapping since internal metrics
> on Storm will be moved to new metrics feature. While some behavioral
> changes
> could be occurred (for example, IMetricsConsumer could be changed to no
> longer receive built-in metrics on Storm), it will not break backward
> compatibility from the API side anyway.
>
> Thanks,
> Jungtaek Lim (HeartSaVioR)
>
> 2016년 5월 20일 (금) 오전 12:57, Abhishek Agarwal <[email protected]>님이 작성:
>
> > Sounds good. Having two separate metric reporters may be confusing but it
> > is better than breaking the client code.
> >
> > Codahale library allows user to specify frequency per reporter instance.
> > Storm on the other hand allows different reporting frequency for each
> > metric. How will that mapping work? I am ok to drop the support for
> custom
> > frequency for each metric. Internal metrics in storm anyway use same
> > frequency of reporting.
> >
> > On Thu, May 19, 2016 at 9:04 PM, Bobby Evans <[email protected]
> >
> > wrote:
> >
> > > I personally would like to see that change happen differently for the
> two
> > > branches.
> > > On 1.x we add in a new API for both reporting metrics and collecting in
> > > parallel to the old API.  We leave IMetric and IMetricsConsumer in
> place
> > > but deprecated.  As we move internal metrics over from the old
> interface
> > to
> > > the new one, we either keep versions of the old ones in place or we
> > provide
> > > a translation shim going from the new to the old.
> > >
> > > In 2.x either the old way is gone completely or it is off by default.
> I
> > > prefer gone completely.
> > >
> > > If we go off of dropwizard/codahale metrics or a layer around them like
> > > was discussed previously it seems fairly straight forward to take some
> of
> > > our current metrics that all trigger at the same interval and setup a
> > > reporter that can translate them into the format that was reported
> > > previously.
> > > In 1.x to get a full picture of what is happening if your topology you
> > may
> > > need two separate reporters.  One for the new metrics and one for the
> > old,
> > > but it should only be for a short period of time. - Bobby
> > >
> > >     On Thursday, May 19, 2016 1:00 AM, Cody Innowhere <
> > [email protected]>
> > > wrote:
> > >
> > >
> > >  If we want to refactor the metrics system, I think we may have to
> incur
> > > breaking changes. We can make it backward compatible but this means we
> > may
> > > build an adapt layer on top of metrics, or a lot of "if...else..."
> which
> > > might be ugly, either way, it might be a pain to maintain the code.
> > > So I prefer to making breaking changes if we want to build a new
> metrics
> > > system, and I'm OK to move JStorm metrics migration phase forward to
> 1.x,
> > > and I'm happy to share our design & experiences.
> > >
> > > On Thu, May 19, 2016 at 11:12 AM, Jungtaek Lim <[email protected]>
> > wrote:
> > >
> > > > Hi devs,
> > > >
> > > > I'd like to see our opinions on breaking changes on metrics for 1.x.
> > > >
> > > > Some backgrounds here:
> > > >
> > > > - As you may have seen, I'm trying to address some places to improve
> > > > metrics without breaking backward compatibility, but it's limited due
> > to
> > > > interface IMetric which is opened to public.
> > > > - We're working on Storm 2.0.0, and evaluation / adoption for metrics
> > > > feature of JStorm is planned to phase 2 but we all don't know
> estimated
> > > > release date, and I feel it's not near future.
> > > > - We've just released Storm 1.0.x, so I expected the lifetime of
> Storm
> > > 1.0
> > > > (even 0.9) months or even years.
> > > >
> > > > If someone wants to know what exactly things current metrics feature
> > > > matter, please let me know so that I will summarize.
> > > >
> > > > I have other ideas on mind to relieve some problems with current
> > metrics,
> > > > so I'm also OK to postpone renewal of metrics to 2.0.0 with applying
> > > those
> > > > workaround ideas. But if we're having willingness to address metrics
> on
> > > > 1.x, IMO we can consider breaking backward compatibility from 1.x for
> > > once.
> > > >
> > > > What do you think?
> > > >
> > > > Thanks,
> > > > Jungtaek Lim (HeartSaVioR)
> > > >
> > >
> > >
> > >
> > >
> >
> >
> >
> > --
> > Regards,
> > Abhishek Agarwal
> >
>

--
Regards,
Abhishek Agarwal

Re: [DISCUSSION] opinions on breaking changes on metrics for 1.x

Reply via email to