Re: Metrics Replacement

dev1 Mon, 27 Sep 2021 15:24:47 -0700

The reporting of the rate vs the absolute count is likely because the logging 
registry is currently implemented using a StepRegistry 
(https://javadoc.io/doc/io.micrometer/micrometer-core/latest/io/micrometer/core/instrument/step/StepMeterRegistry.html)

"Registry that step-normalizes counts and sums to a rate/second over the 
publishing interval"

The counter will, under the covers, just have a counter - the registry is going 
to report the measured value according to the target metrics system.

  1. Based on the Micrometer output, it appears that even if we can get the 
names to match (or document appropriately), users may still have to change 
their tooling based on the values that are being reported.

This unfortunately seems likely to happen - but we should be able to explain 
what is being reported (or even better refer to external docs) - the creation 
of a micrometer instrumentation meter allows for a description so we should be 
able to either automate the description gathering or provide a self-describing 
set of metrics.  We would need to provide a manual mapping of old / new names.

Some systems (like Prometheus) will create descriptive statistics from the raw 
measurements.  If a metric has valid reason to report useful summary 
statistics, then another meter may be a better fit (either a micrometer Timer 
or DistributionSummary) There is a memory cost for accumulating summary 
statistics so it may not always be appropriate for every metric.

  2. It's possible that we could take a different approach, where we continue 
to use Hadoop Metrics2 internally and attempt to write a Micrometer sink for 
the Metrics2 framework for 2.x and move to Micrometer for the next major 
release. Based on the Hadoop JIRA, it does not appear that they have plans to 
move away from this framework.

In my opinion, this would not be worth the effort.

Ed Coleman

________________________________
From: Dave Marion <dmario...@gmail.com>
Sent: Monday, September 27, 2021 4:52 PM
To: dev@accumulo.apache.org <dev@accumulo.apache.org>
Subject: Re: Metrics Replacement

I created a test[1] to see the differences in the output. In this test I
create equivalent metric objects and output them via their respective
logging sink.

For Hadoop Metrics, it created:

1632775059897 ctx.record: Context=ctx, ProcessName=testProcess, counter=1,
gauge=2, QuantileNumI/O=0, Quantile50thPercentileLatency=0,
Quantile75thPercentileLatency=0, Quantile90thPercentileLatency=0,
Quantile95thPercentileLatency=0, Quantile99thPercentileLatency=0,
StatNumI/O=10, StatAvgLatency=10.0, StatStdevLatency=31.622776601683793,
StatIMinLatency=3.4028234663852886E38,
StatIMaxLatency=1.401298464324817E-45,
StatMinLatency=3.4028234663852886E38, StatMaxLatency=1.401298464324817E-45,
StatINumI/O=10

For Micrometer, it created:

[logging-metrics-publisher] INFO
 io.micrometer.core.instrument.logging.LoggingMeterRegistry [] - counter{}
throughput=0.2/s
[logging-metrics-publisher] INFO
 io.micrometer.core.instrument.logging.LoggingMeterRegistry [] - gauge{}
value=10
[logging-metrics-publisher] INFO
 io.micrometer.core.instrument.logging.LoggingMeterRegistry [] - stat{}
throughput=0.2/s mean=0.01s max=0.01s
[logging-metrics-publisher] INFO
 io.micrometer.core.instrument.logging.LoggingMeterRegistry [] - quantile{}
throughput=0.2/s mean=32 max=32

You will see a couple of differences here:

  1. For counters, it appears that Micrometer is dividing the value (1) by
the number of seconds (5), but Hadoop does not. Micrometer talk about this
some at https://micrometer.io/docs/concepts#_counters
  2. Hadoop Metrics2 Stat objects computes a bunch of statistics (avg,
stddev, min, max, IntervalMin and IntervalMax), Micrometer does not
  3. I tried to use a Micrometer DistributionSummary as a replacement for
Hadoop Metrics2 Quantile object. It's possible I need to use a different
object or configure it differently.

Some thoughts:

  1. Based on the Micrometer output, it appears that even if we can get the
names to match (or document appropriately), users may still have to change
their tooling based on the values that are being reported.
  2. It's possible that we could take a different approach, where we
continue to use Hadoop Metrics2 internally and attempt to write a
Micrometer sink for the Metrics2 framework for 2.x and move to Micrometer
for the next major release. Based on the Hadoop JIRA, it does not appear
that they have plans to move away from this framework.

[1] https://gist.github.com/dlmarion/67e0ed8df320633d5af23ae00d965183

On Thu, Sep 23, 2021 at 1:00 PM Christopher <ctubb...@apache.org> wrote:

> +1 to everything Ed wrote. :)
>
> On Wed, Sep 22, 2021 at 10:03 AM <d...@etcoleman.com> wrote:
> >
> > The information provided by micrometer instrumentation should be
> consistent with the values produced by Hadoop metrics.  Things like gauges
> and counters are straight forward and should match 1:1.  Things that
> collect / calculate statics may be slightly different due to implementation
> details - say the way binning for histograms is performed - they will still
> be mathematically correct and the values they report should still be
> consistent, but they might be "different".
> >
> > An issue with metrics is that each collection system seems to have
> slight variations in the way they want things collected and reported.
> Micrometer supports various monitoring systems and a way to implement
> others if a particular system is not currently supported.  In micrometer,
> each registry provides for converting / supporting a specific monitoring
> system.  This includes things like name conversions, rate aggregation
> (client vs. server) and push vs. pull. Our current metrics were named with
> a specific metrics system and a naming convention - rather than trying to
> match our current names exactly we could follow the micrometer naming
> convention and then rely on the micrometer registry conversion to match the
> user's defined collection system.
> >
> > Adopting and following the micrometer conventions should increase our
> compatibility with other collection systems and ease user implementations.
> In places where this might result in a name change, I think we should
> prioritize constancy and normalizing names with conventions. That would
> seem to provide the least surprise to end users and increase their
> flexibility to meet their needs. We should also look to take advantage of
> tagging to allow for aggregation and dimensional drill down to increase
> utility to end users. To the extent that this changes a reported metric
> name, the increased utility and flexibility provided would benefit
> end-users.  While any name change would increase friction for current
> metric consumers, the degree of friction seems independent of the amount of
> change - any change might be disruptive.  I am not advocating that we
> should change names just to change them - rather we should seek to provide
> uniform names and consistent naming conventions across our codebase as
> primary consideration and allow the reported names fall out from there.
> >
> > The configuration of each monitoring system will depend on the system
> chosen by the user.  We should provide a select set of examples (I advocate
> Prometheus, some flavor of statsd and logging) to guide users if one of
> those do not fit their requirements and they elect to use a different
> micrometer module / collection system.
> >
> > I agree that we should supply documentation mapping current names to
> their micrometer equivalents -  the specific name reported will be
> dependent on the conversions performed by the target system - but those
> should be documented in each module and is not within our scope.
> >
> > -----Original Message-----
> > From: Keith Turner <ke...@deenlo.com>
> > Sent: Tuesday, September 21, 2021 5:07 PM
> > To: Accumulo Dev List <dev@accumulo.apache.org>
> > Subject: Re: Metrics Replacement
> >
> > On Tue, Sep 21, 2021 at 3:45 PM Dave Marion <dmario...@gmail.com> wrote:
> > >
> > > There is a WIP pull request against 2.1.0-SNAPSHOT for replacing the
> > > Hadoop
> > > Metrics2 framework with Micrometer[1]. Micrometer suggests using a
> > > naming pattern[2] for the metrics internally where words are all
> > > lowercase separated by a period. Micrometer output formats then
> > > rewrite the metric names to the destination specific format. It's
> > > possible that we may not be able to produce metrics in the same exact
> > > way as the Hadoop Metrics2
> >
> > Is it only the naming pattern that will cause incompatibility, or is it
> more than that?  Like would a timer, guage, etc in micrometer produce
> different information/metrics than a timer,gauge,etc in hadoop metrics?  I
> suspect these would differ and that would also impact compat.  Will the way
> in which accumulo is configured to report metrics also change?  I can't
> imagine it would be the same, but I have not looked at the PR.
> >
> > Can you provide an example of a naming incompat where it has to change?
> >
> > > framework. Metrics are not part of the public API, but we do want to
> > > try and retain as much backwards compatibility as possible. In the
> > > event that we cannot get that compatibility it has been suggested that
> > > we document how things are different. As I have limited knowledge of
> > > how the metrics are
> >
> > Is there a reasonable path to achieving compatibility?  If not, it seems
> like documenting what has changed is a good way to go.  Could possibly
> explain it in detail in the 2.1.0 release notes and have a link to that in
> the user manual.
> >
> > > being used today, I'm looking for some feedback from the community as
> > > to how painful it would be if metric names changed in a minor release.
> > >
> > > [1] https://micrometer.io/
> > > [2] https://micrometer.io/docs/concepts#_naming_meters
> >
>

Re: Metrics Replacement

Reply via email to