Re: Monitoring performance for releases

Maximilian Michels Mon, 27 Jul 2020 07:31:24 -0700

Indeed the Python load test data appears to be missing:http://metrics.beam.apache.org/d/MOi-kf3Zk/pardo-load-tests?orgId=1&var-processingType=streaming&var-sdk=python


How do we typically modify the dashboards?

It looks like we need to edit this json file:https://github.com/apache/beam/blob/8d460db620d2ff1257b0e092218294df15b409a1/.test-infra/metrics/grafana/dashboards/perftests_metrics/ParDo_Load_Tests.json#L81

I found some documentation on the deployment:https://cwiki.apache.org/confluence/display/BEAM/Test+Results+Monitoring

+1 for alerting or weekly emails including performance numbers for fixedintervals (1d, 1w, 1m, previous release).

+1 for linking the dashboards in the release guide to allow for acomparison as part of the release process.

As a first step, consolidating all the data seems like the most pressingproblem to solve.

@Kamil I could need some advice regarding how to proceed updating thedashboards.


-Max

On 22.07.20 20:20, Robert Bradshaw wrote:

On Tue, Jul 21, 2020 at 9:58 AM Thomas Weise <[email protected]<mailto:[email protected]>> wrote:


    It appears that there is coverage missing in the Grafana dashboards
    (it could also be that I just don't find it).

    For example:
    https://apache-beam-testing.appspot.com/explore?dashboard=5751884853805056

    The GBK and ParDo tests have a selection for {batch, streaming} and
    SDK. No coverage for streaming and python? There is also no runner
    option currently.

    We have seen repeated regressions with streaming, Python, Flink. The
    test has been contributed. It would be great if the results can be
    covered as part of release verification.

Even better would be if we can use these dashboards (plus alerting orsimilar?) to find issues before release verification. It's much easierto fix things earlier.



    Thomas



    On Tue, Jul 21, 2020 at 7:55 AM Kamil Wasilewski
    <[email protected] <mailto:[email protected]>>
    wrote:

            The prerequisite is that we have all the stats in one place.
            They seem
            to be scattered across http://metrics.beam.apache.org and
            https://apache-beam-testing.appspot.com.

            Would it be possible to consolidate the two, i.e. use the
            Grafana-based
            dashboard to load the legacy stats?


        I'm pretty sure that all dashboards have been moved to
        http://metrics.beam.apache.org. Let me know if I missed
        something during the migration.

        I think we should turn off
        https://apache-beam-testing.appspot.com in the near future. New
        Grafana-based dashboards have been working seamlessly for some
        time now and there's no point in maintaining the older solution.
        We'd also avoid ambiguity in where the stats should be looked for.

        Kamil

        On Tue, Jul 21, 2020 at 4:17 PM Maximilian Michels
        <[email protected] <mailto:[email protected]>> wrote:

             > It doesn't support https. I had to add an exception to
            the HTTPS Everywhere extension for "metrics.beam.apache.org
            <http://metrics.beam.apache.org>".

            *facepalm* Thanks Udi! It would always hang on me because I
            use HTTPS
            Everywhere.

             > To be explicit, I am supporting the idea of reviewing the
            release guide but not changing the release process for the
            already in-progress release.

            I consider the release guide immutable for the process of a
            release.
            Thus, a change to the release guide can only affect new
            upcoming
            releases, not an in-process release.

             > +1 and I think we can also evaluate whether flaky tests
            should be reviewed as release blockers or not. Some flaky
            tests would be hiding real issues our users could face.

            Flaky tests are also worth to take into account when
            releasing, but a
            little harder to find because may just happen to pass during
            building
            the release. It is possible though if we strictly capture
            flaky tests
            via JIRA and mark them with the Fix Version for the release.

             > We keep accumulating dashboards and
             > tests that few people care about, so it is probably worth
            that we use
             > them or get a way to alert us of regressions during the
            release cycle
             > to catch this even before the RCs.

            +1 The release guide should be explicit about which
            performance test
            results to evaluate.

            The prerequisite is that we have all the stats in one place.
            They seem
            to be scattered across http://metrics.beam.apache.org and
            https://apache-beam-testing.appspot.com.

            Would it be possible to consolidate the two, i.e. use the
            Grafana-based
            dashboard to load the legacy stats?

            For the evaluation during the release process, I suggest to
            use a
            standardized set of performance tests for all runners, e.g.:

            - Nexmark
            - ParDo (Classic/Portable)
            - GroupByKey
            - IO


            -Max

            On 21.07.20 01:23, Ahmet Altay wrote:
             >
             > On Mon, Jul 20, 2020 at 3:07 PM Ismaël Mejía
            <[email protected] <mailto:[email protected]>
             > <mailto:[email protected] <mailto:[email protected]>>> wrote:
             >
             >     +1
             >
             >     This is not in the release guide and we should
            probably re evaluate if
             >     this should be a release blocking reason.
             >     Of course exceptionally a performance regression
            could be motivated by
             >     a correctness fix or a worth refactor, so we should
            consider this.
             >
             >
             > +1 and I think we can also evaluate whether flaky tests
            should be
             > reviewed as release blockers or not. Some flaky tests
            would be hiding
             > real issues our users could face.
             >
             > To be explicit, I am supporting the idea of reviewing the
            release guide
             > but not changing the release process for the already
            in-progress release.
             >
             >
             >     We have been tracking and fixing performance
            regressions multiple
             >     times found simply by checking the nexmark tests
            including on the
             >     ongoing 2.23.0 release so value is there. Nexmark
            does not cover yet
             >     python and portable runners so we are probably still
            missing many
             >     issues and it is worth to work on this. In any case
            we should probably
             >     decide what validations matter. We keep accumulating
            dashboards and
             >     tests that few people care about, so it is probably
            worth that we use
             >     them or get a way to alert us of regressions during
            the release cycle
             >     to catch this even before the RCs.
             >
             >
             > I agree. And if we cannot use dashboards/tests in a
            meaningful way, IMO
             > we can remove them. There is not much value to maintain
            them if they do
             > not provide important signals.
             >
             >
             >     On Fri, Jul 10, 2020 at 9:30 PM Udi Meiri
            <[email protected] <mailto:[email protected]>
             >     <mailto:[email protected] <mailto:[email protected]>>>
            wrote:
             >      >
             >      > On Thu, Jul 9, 2020 at 12:48 PM Maximilian Michels
             >     <[email protected] <mailto:[email protected]>
            <mailto:[email protected] <mailto:[email protected]>>> wrote:
             >      >>
             >      >> Not yet, I just learned about the migration to a
            new frontend,
             >     including
             >      >> a new backend (InfluxDB instead of BigQuery).
             >      >>
             >      >> >  - Are the metrics available on
            metrics.beam.apache.org <http://metrics.beam.apache.org>
             >     <http://metrics.beam.apache.org>?
             >      >>
             >      >> Is http://metrics.beam.apache.org online? I was
            never able to
             >     access it.
             >      >
             >      >
             >      > It doesn't support https. I had to add an
            exception to the HTTPS
             >     Everywhere extension for "metrics.beam.apache.org
            <http://metrics.beam.apache.org>
             >     <http://metrics.beam.apache.org>".
             >      >
             >      >>
             >      >>
             >      >> >  - What is the feature delta between usinig
             > metrics.beam.apache.org <http://metrics.beam.apache.org>
            <http://metrics.beam.apache.org> (much
             >     better UI) and using apache-beam-testing.appspot.com
            <http://apache-beam-testing.appspot.com>
             >     <http://apache-beam-testing.appspot.com>?
             >      >>
             >      >> AFAIK it is an ongoing migration and the delta
            appears to be high.
             >      >>
             >      >> >  - Can we notice regressions faster than
            release cadence?
             >      >>
             >      >> Absolutely! A report with the latest numbers
            including
             >     statistics about
             >      >> the growth of metrics would be useful.
             >      >>
             >      >> >  - Can we get automated alerts?
             >      >>
             >      >> I think we could setup a Jenkins job to do this.
             >      >>
             >      >> -Max
             >      >>
             >      >> On 09.07.20 20:26, Kenneth Knowles wrote:
             >      >> > Questions:
             >      >> >
             >      >> >   - Are the metrics available on
            metrics.beam.apache.org <http://metrics.beam.apache.org>
             >     <http://metrics.beam.apache.org>
             >      >> > <http://metrics.beam.apache.org>?
             >      >> >   - What is the feature delta between usinig
             > metrics.beam.apache.org <http://metrics.beam.apache.org>
            <http://metrics.beam.apache.org>
             >      >> > <http://metrics.beam.apache.org> (much better
            UI) and using
             >      >> > apache-beam-testing.appspot.com
            <http://apache-beam-testing.appspot.com>
             >     <http://apache-beam-testing.appspot.com>
             >     <http://apache-beam-testing.appspot.com>?
             >      >> >   - Can we notice regressions faster than
            release cadence?
             >      >> >   - Can we get automated alerts?
             >      >> >
             >      >> > Kenn
             >      >> >
             >      >> > On Thu, Jul 9, 2020 at 10:21 AM Maximilian Michels
             >     <[email protected] <mailto:[email protected]>
            <mailto:[email protected] <mailto:[email protected]>>
             >      >> > <mailto:[email protected] <mailto:[email protected]>
            <mailto:[email protected] <mailto:[email protected]>>>> wrote:
             >      >> >
             >      >> >     Hi,
             >      >> >
             >      >> >     We recently saw an increase in latency
            migrating from Beam
             >     2.18.0 to
             >      >> >     2.21.0 (Python SDK with Flink Runner). This
            proofed very
             >     hard to debug
             >      >> >     and it looks like each version in between
            the two versions
             >     let to
             >      >> >     increased latency.
             >      >> >
             >      >> >     This is not the first time we saw issues
            when migrating,
             >     another
             >      >> >     time we
             >      >> >     had a decline in checkpointing performance
            and thus added a
             >      >> >     checkpointing test [1] and dashboard [2] (see
             >     checkpointing widget).
             >      >> >
             >      >> >     That makes me wonder if we should monitor
            performance
             >     (throughput /
             >      >> >     latency) for basic use cases as part of the
            release
             >     testing. Currently,
             >      >> >     our release guide [3] mentions running
            examples but not
             >     evaluating the
             >      >> >     performance. I think it would be good
            practice to check
             >     relevant charts
             >      >> >     with performance measurements as part of of
            the release
             >     process. The
             >      >> >     release guide should reflect that.
             >      >> >
             >      >> >     WDYT?
             >      >> >
             >      >> >     -Max
             >      >> >
             >      >> >     PS: Of course, this requires tests and
            metrics to be
             >     available. This PR
             >      >> >     adds latency measurements to the load tests
            [4].
             >      >> >
             >      >> >
             >      >> >     [1] https://github.com/apache/beam/pull/11558
             >      >> >     [2]
             >      >> >
             >
            
https://apache-beam-testing.appspot.com/explore?dashboard=5751884853805056
             >      >> >     [3]
            https://beam.apache.org/contribute/release-guide/
             >      >> >     [4] https://github.com/apache/beam/pull/12065
             >      >> >
             >

Re: Monitoring performance for releases

Reply via email to