Re: Monitoring performance for releases

Michał Walenia Wed, 29 Jul 2020 02:59:25 -0700

Hi there,

> Indeed the Python load test data appears to be missing:
>
http://metrics.beam.apache.org/d/MOi-kf3Zk/pardo-load-tests?orgId=1&var-processingType=streaming&var-sdk=python


I think that the only test data is from Python streaming tests, which are
not implemented right now (check out
http://metrics.beam.apache.org/d/MOi-kf3Zk/pardo-load-tests?orgId=1&var-processingType=batch&var-sdk=python
)

As for updating the dashboards, the manual for doing this is here:
https://cwiki.apache.org/confluence/display/BEAM/Community+Metrics#CommunityMetrics-UpdatingDashboards

I hope this helps,

Michal

On Mon, Jul 27, 2020 at 4:31 PM Maximilian Michels <[email protected]> wrote:

> Indeed the Python load test data appears to be missing:
>
> http://metrics.beam.apache.org/d/MOi-kf3Zk/pardo-load-tests?orgId=1&var-processingType=streaming&var-sdk=python
>
> How do we typically modify the dashboards?
>
> It looks like we need to edit this json file:
>
> https://github.com/apache/beam/blob/8d460db620d2ff1257b0e092218294df15b409a1/.test-infra/metrics/grafana/dashboards/perftests_metrics/ParDo_Load_Tests.json#L81
>
> I found some documentation on the deployment:
> https://cwiki.apache.org/confluence/display/BEAM/Test+Results+Monitoring
>
>
> +1 for alerting or weekly emails including performance numbers for fixed
> intervals (1d, 1w, 1m, previous release).
>
> +1 for linking the dashboards in the release guide to allow for a
> comparison as part of the release process.
>
> As a first step, consolidating all the data seems like the most pressing
> problem to solve.
>
> @Kamil I could need some advice regarding how to proceed updating the
> dashboards.
>
> -Max
>
> On 22.07.20 20:20, Robert Bradshaw wrote:
> > On Tue, Jul 21, 2020 at 9:58 AM Thomas Weise <[email protected]
> > <mailto:[email protected]>> wrote:
> >
> >     It appears that there is coverage missing in the Grafana dashboards
> >     (it could also be that I just don't find it).
> >
> >     For example:
> >
> https://apache-beam-testing.appspot.com/explore?dashboard=5751884853805056
> >
> >     The GBK and ParDo tests have a selection for {batch, streaming} and
> >     SDK. No coverage for streaming and python? There is also no runner
> >     option currently.
> >
> >     We have seen repeated regressions with streaming, Python, Flink. The
> >     test has been contributed. It would be great if the results can be
> >     covered as part of release verification.
> >
> >
> > Even better would be if we can use these dashboards (plus alerting or
> > similar?) to find issues before release verification. It's much easier
> > to fix things earlier.
> >
> >
> >     Thomas
> >
> >
> >
> >     On Tue, Jul 21, 2020 at 7:55 AM Kamil Wasilewski
> >     <[email protected] <mailto:[email protected]>>
> >     wrote:
> >
> >             The prerequisite is that we have all the stats in one place.
> >             They seem
> >             to be scattered across http://metrics.beam.apache.org and
> >             https://apache-beam-testing.appspot.com.
> >
> >             Would it be possible to consolidate the two, i.e. use the
> >             Grafana-based
> >             dashboard to load the legacy stats?
> >
> >
> >         I'm pretty sure that all dashboards have been moved to
> >         http://metrics.beam.apache.org. Let me know if I missed
> >         something during the migration.
> >
> >         I think we should turn off
> >         https://apache-beam-testing.appspot.com in the near future. New
> >         Grafana-based dashboards have been working seamlessly for some
> >         time now and there's no point in maintaining the older solution.
> >         We'd also avoid ambiguity in where the stats should be looked
> for.
> >
> >         Kamil
> >
> >         On Tue, Jul 21, 2020 at 4:17 PM Maximilian Michels
> >         <[email protected] <mailto:[email protected]>> wrote:
> >
> >              > It doesn't support https. I had to add an exception to
> >             the HTTPS Everywhere extension for "metrics.beam.apache.org
> >             <http://metrics.beam.apache.org>".
> >
> >             *facepalm* Thanks Udi! It would always hang on me because I
> >             use HTTPS
> >             Everywhere.
> >
> >              > To be explicit, I am supporting the idea of reviewing the
> >             release guide but not changing the release process for the
> >             already in-progress release.
> >
> >             I consider the release guide immutable for the process of a
> >             release.
> >             Thus, a change to the release guide can only affect new
> >             upcoming
> >             releases, not an in-process release.
> >
> >              > +1 and I think we can also evaluate whether flaky tests
> >             should be reviewed as release blockers or not. Some flaky
> >             tests would be hiding real issues our users could face.
> >
> >             Flaky tests are also worth to take into account when
> >             releasing, but a
> >             little harder to find because may just happen to pass during
> >             building
> >             the release. It is possible though if we strictly capture
> >             flaky tests
> >             via JIRA and mark them with the Fix Version for the release.
> >
> >              > We keep accumulating dashboards and
> >              > tests that few people care about, so it is probably worth
> >             that we use
> >              > them or get a way to alert us of regressions during the
> >             release cycle
> >              > to catch this even before the RCs.
> >
> >             +1 The release guide should be explicit about which
> >             performance test
> >             results to evaluate.
> >
> >             The prerequisite is that we have all the stats in one place.
> >             They seem
> >             to be scattered across http://metrics.beam.apache.org and
> >             https://apache-beam-testing.appspot.com.
> >
> >             Would it be possible to consolidate the two, i.e. use the
> >             Grafana-based
> >             dashboard to load the legacy stats?
> >
> >             For the evaluation during the release process, I suggest to
> >             use a
> >             standardized set of performance tests for all runners, e.g.:
> >
> >             - Nexmark
> >             - ParDo (Classic/Portable)
> >             - GroupByKey
> >             - IO
> >
> >
> >             -Max
> >
> >             On 21.07.20 01:23, Ahmet Altay wrote:
> >              >
> >              > On Mon, Jul 20, 2020 at 3:07 PM Ismaël Mejía
> >             <[email protected] <mailto:[email protected]>
> >              > <mailto:[email protected] <mailto:[email protected]>>>
> wrote:
> >              >
> >              >     +1
> >              >
> >              >     This is not in the release guide and we should
> >             probably re evaluate if
> >              >     this should be a release blocking reason.
> >              >     Of course exceptionally a performance regression
> >             could be motivated by
> >              >     a correctness fix or a worth refactor, so we should
> >             consider this.
> >              >
> >              >
> >              > +1 and I think we can also evaluate whether flaky tests
> >             should be
> >              > reviewed as release blockers or not. Some flaky tests
> >             would be hiding
> >              > real issues our users could face.
> >              >
> >              > To be explicit, I am supporting the idea of reviewing the
> >             release guide
> >              > but not changing the release process for the already
> >             in-progress release.
> >              >
> >              >
> >              >     We have been tracking and fixing performance
> >             regressions multiple
> >              >     times found simply by checking the nexmark tests
> >             including on the
> >              >     ongoing 2.23.0 release so value is there. Nexmark
> >             does not cover yet
> >              >     python and portable runners so we are probably still
> >             missing many
> >              >     issues and it is worth to work on this. In any case
> >             we should probably
> >              >     decide what validations matter. We keep accumulating
> >             dashboards and
> >              >     tests that few people care about, so it is probably
> >             worth that we use
> >              >     them or get a way to alert us of regressions during
> >             the release cycle
> >              >     to catch this even before the RCs.
> >              >
> >              >
> >              > I agree. And if we cannot use dashboards/tests in a
> >             meaningful way, IMO
> >              > we can remove them. There is not much value to maintain
> >             them if they do
> >              > not provide important signals.
> >              >
> >              >
> >              >     On Fri, Jul 10, 2020 at 9:30 PM Udi Meiri
> >             <[email protected] <mailto:[email protected]>
> >              >     <mailto:[email protected] <mailto:[email protected]>>>
> >             wrote:
> >              >      >
> >              >      > On Thu, Jul 9, 2020 at 12:48 PM Maximilian Michels
> >              >     <[email protected] <mailto:[email protected]>
> >             <mailto:[email protected] <mailto:[email protected]>>> wrote:
> >              >      >>
> >              >      >> Not yet, I just learned about the migration to a
> >             new frontend,
> >              >     including
> >              >      >> a new backend (InfluxDB instead of BigQuery).
> >              >      >>
> >              >      >> >  - Are the metrics available on
> >             metrics.beam.apache.org <http://metrics.beam.apache.org>
> >              >     <http://metrics.beam.apache.org>?
> >              >      >>
> >              >      >> Is http://metrics.beam.apache.org online? I was
> >             never able to
> >              >     access it.
> >              >      >
> >              >      >
> >              >      > It doesn't support https. I had to add an
> >             exception to the HTTPS
> >              >     Everywhere extension for "metrics.beam.apache.org
> >             <http://metrics.beam.apache.org>
> >              >     <http://metrics.beam.apache.org>".
> >              >      >
> >              >      >>
> >              >      >>
> >              >      >> >  - What is the feature delta between usinig
> >              > metrics.beam.apache.org <http://metrics.beam.apache.org>
> >             <http://metrics.beam.apache.org> (much
> >              >     better UI) and using apache-beam-testing.appspot.com
> >             <http://apache-beam-testing.appspot.com>
> >              >     <http://apache-beam-testing.appspot.com>?
> >              >      >>
> >              >      >> AFAIK it is an ongoing migration and the delta
> >             appears to be high.
> >              >      >>
> >              >      >> >  - Can we notice regressions faster than
> >             release cadence?
> >              >      >>
> >              >      >> Absolutely! A report with the latest numbers
> >             including
> >              >     statistics about
> >              >      >> the growth of metrics would be useful.
> >              >      >>
> >              >      >> >  - Can we get automated alerts?
> >              >      >>
> >              >      >> I think we could setup a Jenkins job to do this.
> >              >      >>
> >              >      >> -Max
> >              >      >>
> >              >      >> On 09.07.20 20:26, Kenneth Knowles wrote:
> >              >      >> > Questions:
> >              >      >> >
> >              >      >> >   - Are the metrics available on
> >             metrics.beam.apache.org <http://metrics.beam.apache.org>
> >              >     <http://metrics.beam.apache.org>
> >              >      >> > <http://metrics.beam.apache.org>?
> >              >      >> >   - What is the feature delta between usinig
> >              > metrics.beam.apache.org <http://metrics.beam.apache.org>
> >             <http://metrics.beam.apache.org>
> >              >      >> > <http://metrics.beam.apache.org> (much better
> >             UI) and using
> >              >      >> > apache-beam-testing.appspot.com
> >             <http://apache-beam-testing.appspot.com>
> >              >     <http://apache-beam-testing.appspot.com>
> >              >     <http://apache-beam-testing.appspot.com>?
> >              >      >> >   - Can we notice regressions faster than
> >             release cadence?
> >              >      >> >   - Can we get automated alerts?
> >              >      >> >
> >              >      >> > Kenn
> >              >      >> >
> >              >      >> > On Thu, Jul 9, 2020 at 10:21 AM Maximilian
> Michels
> >              >     <[email protected] <mailto:[email protected]>
> >             <mailto:[email protected] <mailto:[email protected]>>
> >              >      >> > <mailto:[email protected] <mailto:[email protected]>
> >             <mailto:[email protected] <mailto:[email protected]>>>> wrote:
> >              >      >> >
> >              >      >> >     Hi,
> >              >      >> >
> >              >      >> >     We recently saw an increase in latency
> >             migrating from Beam
> >              >     2.18.0 to
> >              >      >> >     2.21.0 (Python SDK with Flink Runner). This
> >             proofed very
> >              >     hard to debug
> >              >      >> >     and it looks like each version in between
> >             the two versions
> >              >     let to
> >              >      >> >     increased latency.
> >              >      >> >
> >              >      >> >     This is not the first time we saw issues
> >             when migrating,
> >              >     another
> >              >      >> >     time we
> >              >      >> >     had a decline in checkpointing performance
> >             and thus added a
> >              >      >> >     checkpointing test [1] and dashboard [2]
> (see
> >              >     checkpointing widget).
> >              >      >> >
> >              >      >> >     That makes me wonder if we should monitor
> >             performance
> >              >     (throughput /
> >              >      >> >     latency) for basic use cases as part of the
> >             release
> >              >     testing. Currently,
> >              >      >> >     our release guide [3] mentions running
> >             examples but not
> >              >     evaluating the
> >              >      >> >     performance. I think it would be good
> >             practice to check
> >              >     relevant charts
> >              >      >> >     with performance measurements as part of of
> >             the release
> >              >     process. The
> >              >      >> >     release guide should reflect that.
> >              >      >> >
> >              >      >> >     WDYT?
> >              >      >> >
> >              >      >> >     -Max
> >              >      >> >
> >              >      >> >     PS: Of course, this requires tests and
> >             metrics to be
> >              >     available. This PR
> >              >      >> >     adds latency measurements to the load tests
> >             [4].
> >              >      >> >
> >              >      >> >
> >              >      >> >     [1]
> https://github.com/apache/beam/pull/11558
> >              >      >> >     [2]
> >              >      >> >
> >              >
> >
> https://apache-beam-testing.appspot.com/explore?dashboard=5751884853805056
> >              >      >> >     [3]
> >             https://beam.apache.org/contribute/release-guide/
> >              >      >> >     [4]
> https://github.com/apache/beam/pull/12065
> >              >      >> >
> >              >
> >
>


-- 

Michał Walenia
Polidea <https://www.polidea.com/> | Software Engineer

M: +48 791 432 002 <+48791432002>
E: [email protected]

Unique Tech
Check out our projects! <https://www.polidea.com/our-work>

Re: Monitoring performance for releases

Reply via email to