Indeed the Python load test data appears to be missing:
http://metrics.beam.apache.org/d/MOi-kf3Zk/pardo-load-tests?orgId=1&var-processingType=streaming&var-sdk=python
How do we typically modify the dashboards?
It looks like we need to edit this json file:
https://github.com/apache/beam/blob/8d460db620d2ff1257b0e092218294df15b409a1/.test-infra/metrics/grafana/dashboards/perftests_metrics/ParDo_Load_Tests.json#L81
I found some documentation on the deployment:
https://cwiki.apache.org/confluence/display/BEAM/Test+Results+Monitoring
+1 for alerting or weekly emails including performance numbers for fixed
intervals (1d, 1w, 1m, previous release).
+1 for linking the dashboards in the release guide to allow for a
comparison as part of the release process.
As a first step, consolidating all the data seems like the most pressing
problem to solve.
@Kamil I could need some advice regarding how to proceed updating the
dashboards.
-Max
On 22.07.20 20:20, Robert Bradshaw wrote:
On Tue, Jul 21, 2020 at 9:58 AM Thomas Weise <[email protected]
<mailto:[email protected]>> wrote:
It appears that there is coverage missing in the Grafana dashboards
(it could also be that I just don't find it).
For example:
https://apache-beam-testing.appspot.com/explore?dashboard=5751884853805056
The GBK and ParDo tests have a selection for {batch, streaming} and
SDK. No coverage for streaming and python? There is also no runner
option currently.
We have seen repeated regressions with streaming, Python, Flink. The
test has been contributed. It would be great if the results can be
covered as part of release verification.
Even better would be if we can use these dashboards (plus alerting or
similar?) to find issues before release verification. It's much easier
to fix things earlier.
Thomas
On Tue, Jul 21, 2020 at 7:55 AM Kamil Wasilewski
<[email protected] <mailto:[email protected]>>
wrote:
The prerequisite is that we have all the stats in one place.
They seem
to be scattered across http://metrics.beam.apache.org and
https://apache-beam-testing.appspot.com.
Would it be possible to consolidate the two, i.e. use the
Grafana-based
dashboard to load the legacy stats?
I'm pretty sure that all dashboards have been moved to
http://metrics.beam.apache.org. Let me know if I missed
something during the migration.
I think we should turn off
https://apache-beam-testing.appspot.com in the near future. New
Grafana-based dashboards have been working seamlessly for some
time now and there's no point in maintaining the older solution.
We'd also avoid ambiguity in where the stats should be looked for.
Kamil
On Tue, Jul 21, 2020 at 4:17 PM Maximilian Michels
<[email protected] <mailto:[email protected]>> wrote:
> It doesn't support https. I had to add an exception to
the HTTPS Everywhere extension for "metrics.beam.apache.org
<http://metrics.beam.apache.org>".
*facepalm* Thanks Udi! It would always hang on me because I
use HTTPS
Everywhere.
> To be explicit, I am supporting the idea of reviewing the
release guide but not changing the release process for the
already in-progress release.
I consider the release guide immutable for the process of a
release.
Thus, a change to the release guide can only affect new
upcoming
releases, not an in-process release.
> +1 and I think we can also evaluate whether flaky tests
should be reviewed as release blockers or not. Some flaky
tests would be hiding real issues our users could face.
Flaky tests are also worth to take into account when
releasing, but a
little harder to find because may just happen to pass during
building
the release. It is possible though if we strictly capture
flaky tests
via JIRA and mark them with the Fix Version for the release.
> We keep accumulating dashboards and
> tests that few people care about, so it is probably worth
that we use
> them or get a way to alert us of regressions during the
release cycle
> to catch this even before the RCs.
+1 The release guide should be explicit about which
performance test
results to evaluate.
The prerequisite is that we have all the stats in one place.
They seem
to be scattered across http://metrics.beam.apache.org and
https://apache-beam-testing.appspot.com.
Would it be possible to consolidate the two, i.e. use the
Grafana-based
dashboard to load the legacy stats?
For the evaluation during the release process, I suggest to
use a
standardized set of performance tests for all runners, e.g.:
- Nexmark
- ParDo (Classic/Portable)
- GroupByKey
- IO
-Max
On 21.07.20 01:23, Ahmet Altay wrote:
>
> On Mon, Jul 20, 2020 at 3:07 PM Ismaël Mejía
<[email protected] <mailto:[email protected]>
> <mailto:[email protected] <mailto:[email protected]>>> wrote:
>
> +1
>
> This is not in the release guide and we should
probably re evaluate if
> this should be a release blocking reason.
> Of course exceptionally a performance regression
could be motivated by
> a correctness fix or a worth refactor, so we should
consider this.
>
>
> +1 and I think we can also evaluate whether flaky tests
should be
> reviewed as release blockers or not. Some flaky tests
would be hiding
> real issues our users could face.
>
> To be explicit, I am supporting the idea of reviewing the
release guide
> but not changing the release process for the already
in-progress release.
>
>
> We have been tracking and fixing performance
regressions multiple
> times found simply by checking the nexmark tests
including on the
> ongoing 2.23.0 release so value is there. Nexmark
does not cover yet
> python and portable runners so we are probably still
missing many
> issues and it is worth to work on this. In any case
we should probably
> decide what validations matter. We keep accumulating
dashboards and
> tests that few people care about, so it is probably
worth that we use
> them or get a way to alert us of regressions during
the release cycle
> to catch this even before the RCs.
>
>
> I agree. And if we cannot use dashboards/tests in a
meaningful way, IMO
> we can remove them. There is not much value to maintain
them if they do
> not provide important signals.
>
>
> On Fri, Jul 10, 2020 at 9:30 PM Udi Meiri
<[email protected] <mailto:[email protected]>
> <mailto:[email protected] <mailto:[email protected]>>>
wrote:
> >
> > On Thu, Jul 9, 2020 at 12:48 PM Maximilian Michels
> <[email protected] <mailto:[email protected]>
<mailto:[email protected] <mailto:[email protected]>>> wrote:
> >>
> >> Not yet, I just learned about the migration to a
new frontend,
> including
> >> a new backend (InfluxDB instead of BigQuery).
> >>
> >> > - Are the metrics available on
metrics.beam.apache.org <http://metrics.beam.apache.org>
> <http://metrics.beam.apache.org>?
> >>
> >> Is http://metrics.beam.apache.org online? I was
never able to
> access it.
> >
> >
> > It doesn't support https. I had to add an
exception to the HTTPS
> Everywhere extension for "metrics.beam.apache.org
<http://metrics.beam.apache.org>
> <http://metrics.beam.apache.org>".
> >
> >>
> >>
> >> > - What is the feature delta between usinig
> metrics.beam.apache.org <http://metrics.beam.apache.org>
<http://metrics.beam.apache.org> (much
> better UI) and using apache-beam-testing.appspot.com
<http://apache-beam-testing.appspot.com>
> <http://apache-beam-testing.appspot.com>?
> >>
> >> AFAIK it is an ongoing migration and the delta
appears to be high.
> >>
> >> > - Can we notice regressions faster than
release cadence?
> >>
> >> Absolutely! A report with the latest numbers
including
> statistics about
> >> the growth of metrics would be useful.
> >>
> >> > - Can we get automated alerts?
> >>
> >> I think we could setup a Jenkins job to do this.
> >>
> >> -Max
> >>
> >> On 09.07.20 20:26, Kenneth Knowles wrote:
> >> > Questions:
> >> >
> >> > - Are the metrics available on
metrics.beam.apache.org <http://metrics.beam.apache.org>
> <http://metrics.beam.apache.org>
> >> > <http://metrics.beam.apache.org>?
> >> > - What is the feature delta between usinig
> metrics.beam.apache.org <http://metrics.beam.apache.org>
<http://metrics.beam.apache.org>
> >> > <http://metrics.beam.apache.org> (much better
UI) and using
> >> > apache-beam-testing.appspot.com
<http://apache-beam-testing.appspot.com>
> <http://apache-beam-testing.appspot.com>
> <http://apache-beam-testing.appspot.com>?
> >> > - Can we notice regressions faster than
release cadence?
> >> > - Can we get automated alerts?
> >> >
> >> > Kenn
> >> >
> >> > On Thu, Jul 9, 2020 at 10:21 AM Maximilian Michels
> <[email protected] <mailto:[email protected]>
<mailto:[email protected] <mailto:[email protected]>>
> >> > <mailto:[email protected] <mailto:[email protected]>
<mailto:[email protected] <mailto:[email protected]>>>> wrote:
> >> >
> >> > Hi,
> >> >
> >> > We recently saw an increase in latency
migrating from Beam
> 2.18.0 to
> >> > 2.21.0 (Python SDK with Flink Runner). This
proofed very
> hard to debug
> >> > and it looks like each version in between
the two versions
> let to
> >> > increased latency.
> >> >
> >> > This is not the first time we saw issues
when migrating,
> another
> >> > time we
> >> > had a decline in checkpointing performance
and thus added a
> >> > checkpointing test [1] and dashboard [2] (see
> checkpointing widget).
> >> >
> >> > That makes me wonder if we should monitor
performance
> (throughput /
> >> > latency) for basic use cases as part of the
release
> testing. Currently,
> >> > our release guide [3] mentions running
examples but not
> evaluating the
> >> > performance. I think it would be good
practice to check
> relevant charts
> >> > with performance measurements as part of of
the release
> process. The
> >> > release guide should reflect that.
> >> >
> >> > WDYT?
> >> >
> >> > -Max
> >> >
> >> > PS: Of course, this requires tests and
metrics to be
> available. This PR
> >> > adds latency measurements to the load tests
[4].
> >> >
> >> >
> >> > [1] https://github.com/apache/beam/pull/11558
> >> > [2]
> >> >
>
https://apache-beam-testing.appspot.com/explore?dashboard=5751884853805056
> >> > [3]
https://beam.apache.org/contribute/release-guide/
> >> > [4] https://github.com/apache/beam/pull/12065
> >> >
>