+1, thanks for driving this!

On a side note, can we also ensure that a performance summary report for
Flink major version upgrades is in release notes, once this infrastructure
becomes mature? From the user perspective, it would be nice to know what
the expected (or unexpected) regressions in a major version upgrade are.
I've seen the community do something like this before (e.g. the major
rocksdb version bump in 1.14?) and it was quite valuable to know that
upfront!

Best,
Mason

On Fri, Oct 28, 2022 at 1:46 AM weijie guo <guoweijieres...@gmail.com>
wrote:

> Thanks Yanfei for driving this.
>
> It allows us to easily find the problem of performance regression.
> Especially recently, I have made some improvements to the scheduling
> related parts, your work is very important to ensure that these changes do
> not cause some unexpected problems.
>
> Best regards,
>
> Weijie
>
>
> Congxian Qiu <qcx978132...@gmail.com> 于2022年10月28日周五 16:03写道:
>
> > Thanks for driving this and making the performance monitoring public,
> this
> > can make us know and resolve the performance problem quickly.
> >
> > Looking forward to the workflow and detailed descriptions fo
> > flink-dev-benchmarks.
> >
> > Best,
> > Congxian
> >
> >
> > Yun Tang <myas...@live.com> 于2022年10月27日周四 12:41写道:
> >
> > > Thanks, Yanfei for driving this to monitor the performance in the
> Apache
> > > Flink Slack Channel.
> > >
> > > Look forward to the workflow and detailed descriptions of
> > > flink-dev-benchmarks.
> > >
> > > Best
> > > Yun Tang
> > > ________________________________
> > > From: Hangxiang Yu <master...@gmail.com>
> > > Sent: Thursday, October 27, 2022 10:59
> > > To: dev@flink.apache.org <dev@flink.apache.org>
> > > Subject: Re: [ANNOUNCE] Performance Daily Monitoring Moved from
> Ververica
> > > to Apache Flink Slack Channel
> > >
> > > Hi, Yanfei.
> > > Thanks for driving this.
> > > It could help us to detect and resolve the regression problem quickly
> and
> > > officially.
> > > I'd like to join as a maintainer.
> > > Looking forward to the workflow.
> > >
> > > On Wed, Oct 26, 2022 at 5:18 PM Yuan Mei <yuanmei.w...@gmail.com>
> wrote:
> > >
> > > > Thanks, Yanfei, to drive this and make the performance monitoring
> > > publicly
> > > > available.
> > > >
> > > > Looking forward to seeing the workflow, and more details as Martijn
> > > > mentioned.
> > > >
> > > > Best
> > > > Yuan
> > > >
> > > > On Wed, Oct 26, 2022 at 2:59 PM Martijn Visser <
> > martijnvis...@apache.org
> > > >
> > > > wrote:
> > > >
> > > > > Hi Yanfei Lei,
> > > > >
> > > > > Thanks for setting this up! It would be interesting to also know
> > which
> > > > > aspects of Flink are monitored for "performance". I'm assuming
> there
> > > are
> > > > > specific pieces of functionality that are performance tested, but
> it
> > > > would
> > > > > be great if this would be written down somewhere (next to a
> procedure
> > > how
> > > > > to detect a regression and what should be next steps).
> > > > >
> > > > > Best regards,
> > > > >
> > > > > Martijn
> > > > >
> > > > > On Wed, Oct 26, 2022 at 8:21 AM Zakelly Lan <zakelly....@gmail.com
> >
> > > > wrote:
> > > > >
> > > > > > Hi yanfei,
> > > > > >
> > > > > > Thanks for driving this! It's a great help.
> > > > > >
> > > > > > I would like to join as a maintainer.
> > > > > >
> > > > > > Best,
> > > > > > Zakelly
> > > > > >
> > > > > > On Wed, Oct 26, 2022 at 11:32 AM yanfei lei <fredia...@gmail.com
> >
> > > > wrote:
> > > > > > >
> > > > > > > Hi everyone,
> > > > > > >
> > > > > > > As discussed earlier, we plan to create a benchmark channel in
> > > Apache
> > > > > > Flink
> > > > > > > slack[1], but the plan was shelved for a while[2]. So I went on
> > > with
> > > > > this
> > > > > > > work, and created the #flink-dev-benchmarks channel for
> > performance
> > > > > > > regression notifications.
> > > > > > >
> > > > > > > We have a regression report script[3] that runs daily, and a
> > > > > notification
> > > > > > > would be sent to the slack channel when the last few benchmark
> > > > results
> > > > > > are
> > > > > > > significantly worse than the baseline.
> > > > > > > Note, regressions are detected by a simple script which may
> have
> > > > false
> > > > > > > positives and false negatives. And all benchmarks are executed
> on
> > > one
> > > > > > > physical machine[4] which is provided by Ververica(Alibaba)[5],
> > it
> > > > > might
> > > > > > > happen that hardware issues affect performance, like
> > "[FLINK-18614
> > > > > > > <https://issues.apache.org/jira/browse/FLINK-18614>]
> Performance
> > > > > > regression
> > > > > > > 2020.07.13"[6].
> > > > > > >
> > > > > > > After the migration, we need a procedure to watch over the
> entire
> > > > > > > performance of Flink code together. For example, if a
> regression
> > > > > > > occurs, investigating the cause and resolving the problem are
> > > needed.
> > > > > In
> > > > > > > the past, this procedure is maintained internally within
> > Ververica,
> > > > but
> > > > > > we
> > > > > > > think making the procedure public would benefit all. I
> volunteer
> > to
> > > > > serve
> > > > > > > as one of the initial maintainers, and would be glad if more
> > > > > contributors
> > > > > > > can join me. I'd also prepare some guidelines to help others
> get
> > > > > familiar
> > > > > > > with the workflow. I will start a new thread to discuss the
> > > workflow
> > > > > > soon.
> > > > > > >
> > > > > > >
> > > > > > > [1]
> > > https://www.mail-archive.com/dev@flink.apache.org/msg58666.html
> > > > > > > [2] https://issues.apache.org/jira/browse/FLINK-28468
> > > > > > > [3]
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/flink-benchmarks/blob/master/regression_report.py
> > > > > > > [4] http://codespeed.dak8s.net:8080
> > > > > > > [5]
> > > https://lists.apache.org/thread/jzljp4233799vwwqnr0vc9wgqs0xj1ro
> > > > > > >
> > > > > > > [6] https://issues.apache.org/jira/browse/FLINK-18614
> > > > > >
> > > > >
> > > >
> > >
> > >
> > > --
> > > Best,
> > > Hangxiang.
> > >
> >
>

Reply via email to