Re: [DISCUSSION] opinions on breaking changes on metrics for 1.x

Cody Innowhere Mon, 23 May 2016 06:52:56 -0700

Taking queue depth is a good idea for measuring load of components though
there're some scenarios that users want to know the latency of a specific
method/processing flow.
BTW., we've added support to turn on/off metrics dynamically in
JStorm(actually we had this feature in JStorm 2.1.1 but there's a bug in
implementation), like metrics filtering, I think this is a pretty practical
requirement.


I agree with you that it's not a simple 1 to 1 translation, actually new
metrics requirements look more like refactoring rather than strong
improvements.

On Mon, May 23, 2016 at 9:36 PM, Bobby Evans <ev...@yahoo-inc.com.invalid>
wrote:

> From JStorm we got the ability to show most of the metrics on the UI, but
> I think we missed the ability to turn on and off metrics dynamically.  This
> is important because there will be metrics that are expensive to gather,
> but very important when debugging specific problems.
> I also think we should spend some time thinking about why we are
> gathering/computing specific metrics, what is the question we are trying to
> answer and is there a better way to answer it.
> For example we collect a lot of latency statistics for a bolt to answer a
> few different questions, but for the most part the latency stats we have
> are bad at answering either of those questions.
>
> 1) How over loaded is a particular bolt? aka capacity.
> 2) How does this bolt contribute to the overall processing time in a
> topology?
> For the first one we subsample because it is expensive to compute the
> latency every time.  But in practice I have seen many bolts with a capacity
> well over 1, I have even seen a 35.  How can a bolt be running 3500% of the
> time? If we use the input queue depth instead that could give us a much
> better picture, and be a lot less expensive.
>
> For the second one we have talked about adding in more latency
> measurements, (serialization/deserialization time, etc) but I think we will
> end up playing whack a mole unless we take a wholistic approach where we
> don't just measure small part of the system, but we measure latency from
> point A to point B and from point B to point C, so there are no gaps?
>
> I just don't think we want to have a simple 1 to 1 translation.
>  - Bobby
>
>     On Monday, May 23, 2016 5:05 AM, Jungtaek Lim <kabh...@gmail.com>
> wrote:
>
>
>  Thanks for following up, Abhishek. I agree with you about the consensus.
> I think you already listed most of my / community requirements. I'm adding
> up something I have in mind,
> 7. Aggregation at stream level (STORM-1719), and machine level8. way to
> subscribe cluster metrics (STORM-1723)9. counter stats as non-sampled if it
> doesn't hurt performance10. more metrics like serialization/deserialization
> latency, queue status
> I'd also like to see Spout offset information if implementation of Spout
> can provide, but it's just my 2 cent.
> Thanks,
> Jungtaek Lim (HeartSaVioR)
> 2016년 5월 23일 (월) 오후 6:33, Abhishek Agarwal <abhishc...@gmail.com>님이 작성:
>
> So I am assuming that there is a general consensus on adding new api for
> metrics and gradually phasing out the old one. If yes, may be we can work
> toward finer details of how to maintain two apis as well as the design of
> new API.
>
> Jungtaek, it would be better to summarize the requirements and let others
> add on what they feel missing. Some asks I have seen -
>
> 1. Aggregation at component level (Average, Sum etc) -
> 2. Blacklist/whitelist
> 3. Allow only numbers for values
> 4. Efficient routing of built-in metrics to UI (current they get tagged
> along with executor heartbeat which puts pressure on zookeeper)
> 5. Worker/JVM level metrics which are not owned by a particular component
> 6. Percentiles for latency metrics such as p99, p95 etc
>
> Not all of them may be considered. Please add anything I might have missed.
>
>
>
> On Fri, May 20, 2016 at 5:57 AM, Jungtaek Lim <kabh...@gmail.com> wrote:
>
> > Personally I'm also in favor of maintaining old API (but deprecated) and
> > adding new API.
> > It's ideal way, and that's what many projects are trying to do, and so on
> > the other project I'm also maintaining.
> >
> > And I also prefer to gone away current metrics feature in next major
> > release. In general, before each major release we should discuss which
> > classes/features to drop. I think we forgot this when we release 1.0.0,
> and
> > deprecated things are all alive.
> >
> > I'm also in favor of dropping the support for custom frequency for each
> > metric. I guess we don't need to worry about mapping since internal
> metrics
> > on Storm will be moved to new metrics feature. While some behavioral
> > changes
> > could be occurred (for example, IMetricsConsumer could be changed to no
> > longer receive built-in metrics on Storm), it will not break backward
> > compatibility from the API side anyway.
> >
> > Thanks,
> > Jungtaek Lim (HeartSaVioR)
> >
> > 2016년 5월 20일 (금) 오전 12:57, Abhishek Agarwal <abhishc...@gmail.com>님이 작성:
> >
> > > Sounds good. Having two separate metric reporters may be confusing but
> it
> > > is better than breaking the client code.
> > >
> > > Codahale library allows user to specify frequency per reporter
> instance.
> > > Storm on the other hand allows different reporting frequency for each
> > > metric. How will that mapping work? I am ok to drop the support for
> > custom
> > > frequency for each metric. Internal metrics in storm anyway use same
> > > frequency of reporting.
> > >
> > > On Thu, May 19, 2016 at 9:04 PM, Bobby Evans
> <ev...@yahoo-inc.com.invalid
> > >
> > > wrote:
> > >
> > > > I personally would like to see that change happen differently for the
> > two
> > > > branches.
> > > > On 1.x we add in a new API for both reporting metrics and collecting
> in
> > > > parallel to the old API.  We leave IMetric and IMetricsConsumer in
> > place
> > > > but deprecated.  As we move internal metrics over from the old
> > interface
> > > to
> > > > the new one, we either keep versions of the old ones in place or we
> > > provide
> > > > a translation shim going from the new to the old.
> > > >
> > > > In 2.x either the old way is gone completely or it is off by default.
> > I
> > > > prefer gone completely.
> > > >
> > > > If we go off of dropwizard/codahale metrics or a layer around them
> like
> > > > was discussed previously it seems fairly straight forward to take
> some
> > of
> > > > our current metrics that all trigger at the same interval and setup a
> > > > reporter that can translate them into the format that was reported
> > > > previously.
> > > > In 1.x to get a full picture of what is happening if your topology
> you
> > > may
> > > > need two separate reporters.  One for the new metrics and one for the
> > > old,
> > > > but it should only be for a short period of time. - Bobby
> > > >
> > > >     On Thursday, May 19, 2016 1:00 AM, Cody Innowhere <
> > > e.neve...@gmail.com>
> > > > wrote:
> > > >
> > > >
> > > >  If we want to refactor the metrics system, I think we may have to
> > incur
> > > > breaking changes. We can make it backward compatible but this means
> we
> > > may
> > > > build an adapt layer on top of metrics, or a lot of "if...else..."
> > which
> > > > might be ugly, either way, it might be a pain to maintain the code.
> > > > So I prefer to making breaking changes if we want to build a new
> > metrics
> > > > system, and I'm OK to move JStorm metrics migration phase forward to
> > 1.x,
> > > > and I'm happy to share our design & experiences.
> > > >
> > > > On Thu, May 19, 2016 at 11:12 AM, Jungtaek Lim <kabh...@gmail.com>
> > > wrote:
> > > >
> > > > > Hi devs,
> > > > >
> > > > > I'd like to see our opinions on breaking changes on metrics for
> 1.x.
> > > > >
> > > > > Some backgrounds here:
> > > > >
> > > > > - As you may have seen, I'm trying to address some places to
> improve
> > > > > metrics without breaking backward compatibility, but it's limited
> due
> > > to
> > > > > interface IMetric which is opened to public.
> > > > > - We're working on Storm 2.0.0, and evaluation / adoption for
> metrics
> > > > > feature of JStorm is planned to phase 2 but we all don't know
> > estimated
> > > > > release date, and I feel it's not near future.
> > > > > - We've just released Storm 1.0.x, so I expected the lifetime of
> > Storm
> > > > 1.0
> > > > > (even 0.9) months or even years.
> > > > >
> > > > > If someone wants to know what exactly things current metrics
> feature
> > > > > matter, please let me know so that I will summarize.
> > > > >
> > > > > I have other ideas on mind to relieve some problems with current
> > > metrics,
> > > > > so I'm also OK to postpone renewal of metrics to 2.0.0 with
> applying
> > > > those
> > > > > workaround ideas. But if we're having willingness to address
> metrics
> > on
> > > > > 1.x, IMO we can consider breaking backward compatibility from 1.x
> for
> > > > once.
> > > > >
> > > > > What do you think?
> > > > >
> > > > > Thanks,
> > > > > Jungtaek Lim (HeartSaVioR)
> > > > >
> > > >
> > > >
> > > >
> > > >
> > >
> > >
> > >
> > > --
> > > Regards,
> > > Abhishek Agarwal
> > >
> >
>
>
>
> --
> Regards,
> Abhishek Agarwal
>
>
>
>
>

Re: [DISCUSSION] opinions on breaking changes on metrics for 1.x

Reply via email to