Re: [ANNOUNCE] Performance Daily Monitoring Moved from Ververica to Apache Flink Slack Channel

Martijn Visser Mon, 28 Nov 2022 23:54:39 -0800

Hi,

Is there any update to be expected on the benchmark? I see results of the
benchmark being posted to Slack, but it appears that it's not being
monitored and no follow-up actions are being taken. I think it's currently
lacking a process on how to interpret the results and what action should
be taken and by whom.


Best regards,

Martijn

On Thu, Nov 3, 2022 at 12:22 PM Jing Ge <[email protected]> wrote:

> Thanks yanfei for driving this!
>
> Looking forward to further discussion w.r.t. the workflow.
>
> Best regards,
> Jing
>
> On Mon, Oct 31, 2022 at 6:04 PM Mason Chen <[email protected]> wrote:
>
> > +1, thanks for driving this!
> >
> > On a side note, can we also ensure that a performance summary report for
> > Flink major version upgrades is in release notes, once this
> infrastructure
> > becomes mature? From the user perspective, it would be nice to know what
> > the expected (or unexpected) regressions in a major version upgrade are.
> > I've seen the community do something like this before (e.g. the major
> > rocksdb version bump in 1.14?) and it was quite valuable to know that
> > upfront!
> >
> > Best,
> > Mason
> >
> > On Fri, Oct 28, 2022 at 1:46 AM weijie guo <[email protected]>
> > wrote:
> >
> > > Thanks Yanfei for driving this.
> > >
> > > It allows us to easily find the problem of performance regression.
> > > Especially recently, I have made some improvements to the scheduling
> > > related parts, your work is very important to ensure that these changes
> > do
> > > not cause some unexpected problems.
> > >
> > > Best regards,
> > >
> > > Weijie
> > >
> > >
> > > Congxian Qiu <[email protected]> 于2022年10月28日周五 16:03写道：
> > >
> > > > Thanks for driving this and making the performance monitoring public,
> > > this
> > > > can make us know and resolve the performance problem quickly.
> > > >
> > > > Looking forward to the workflow and detailed descriptions fo
> > > > flink-dev-benchmarks.
> > > >
> > > > Best,
> > > > Congxian
> > > >
> > > >
> > > > Yun Tang <[email protected]> 于2022年10月27日周四 12:41写道：
> > > >
> > > > > Thanks, Yanfei for driving this to monitor the performance in the
> > > Apache
> > > > > Flink Slack Channel.
> > > > >
> > > > > Look forward to the workflow and detailed descriptions of
> > > > > flink-dev-benchmarks.
> > > > >
> > > > > Best
> > > > > Yun Tang
> > > > > ________________________________
> > > > > From: Hangxiang Yu <[email protected]>
> > > > > Sent: Thursday, October 27, 2022 10:59
> > > > > To: [email protected] <[email protected]>
> > > > > Subject: Re: [ANNOUNCE] Performance Daily Monitoring Moved from
> > > Ververica
> > > > > to Apache Flink Slack Channel
> > > > >
> > > > > Hi, Yanfei.
> > > > > Thanks for driving this.
> > > > > It could help us to detect and resolve the regression problem
> quickly
> > > and
> > > > > officially.
> > > > > I'd like to join as a maintainer.
> > > > > Looking forward to the workflow.
> > > > >
> > > > > On Wed, Oct 26, 2022 at 5:18 PM Yuan Mei <[email protected]>
> > > wrote:
> > > > >
> > > > > > Thanks, Yanfei, to drive this and make the performance monitoring
> > > > > publicly
> > > > > > available.
> > > > > >
> > > > > > Looking forward to seeing the workflow, and more details as
> Martijn
> > > > > > mentioned.
> > > > > >
> > > > > > Best
> > > > > > Yuan
> > > > > >
> > > > > > On Wed, Oct 26, 2022 at 2:59 PM Martijn Visser <
> > > > [email protected]
> > > > > >
> > > > > > wrote:
> > > > > >
> > > > > > > Hi Yanfei Lei,
> > > > > > >
> > > > > > > Thanks for setting this up! It would be interesting to also
> know
> > > > which
> > > > > > > aspects of Flink are monitored for "performance". I'm assuming
> > > there
> > > > > are
> > > > > > > specific pieces of functionality that are performance tested,
> but
> > > it
> > > > > > would
> > > > > > > be great if this would be written down somewhere (next to a
> > > procedure
> > > > > how
> > > > > > > to detect a regression and what should be next steps).
> > > > > > >
> > > > > > > Best regards,
> > > > > > >
> > > > > > > Martijn
> > > > > > >
> > > > > > > On Wed, Oct 26, 2022 at 8:21 AM Zakelly Lan <
> > [email protected]
> > > >
> > > > > > wrote:
> > > > > > >
> > > > > > > > Hi yanfei,
> > > > > > > >
> > > > > > > > Thanks for driving this! It's a great help.
> > > > > > > >
> > > > > > > > I would like to join as a maintainer.
> > > > > > > >
> > > > > > > > Best,
> > > > > > > > Zakelly
> > > > > > > >
> > > > > > > > On Wed, Oct 26, 2022 at 11:32 AM yanfei lei <
> > [email protected]
> > > >
> > > > > > wrote:
> > > > > > > > >
> > > > > > > > > Hi everyone,
> > > > > > > > >
> > > > > > > > > As discussed earlier, we plan to create a benchmark channel
> > in
> > > > > Apache
> > > > > > > > Flink
> > > > > > > > > slack[1], but the plan was shelved for a while[2]. So I
> went
> > on
> > > > > with
> > > > > > > this
> > > > > > > > > work, and created the #flink-dev-benchmarks channel for
> > > > performance
> > > > > > > > > regression notifications.
> > > > > > > > >
> > > > > > > > > We have a regression report script[3] that runs daily, and
> a
> > > > > > > notification
> > > > > > > > > would be sent to the slack channel when the last few
> > benchmark
> > > > > > results
> > > > > > > > are
> > > > > > > > > significantly worse than the baseline.
> > > > > > > > > Note, regressions are detected by a simple script which may
> > > have
> > > > > > false
> > > > > > > > > positives and false negatives. And all benchmarks are
> > executed
> > > on
> > > > > one
> > > > > > > > > physical machine[4] which is provided by
> > Ververica(Alibaba)[5],
> > > > it
> > > > > > > might
> > > > > > > > > happen that hardware issues affect performance, like
> > > > "[FLINK-18614
> > > > > > > > > <https://issues.apache.org/jira/browse/FLINK-18614>]
> > > Performance
> > > > > > > > regression
> > > > > > > > > 2020.07.13"[6].
> > > > > > > > >
> > > > > > > > > After the migration, we need a procedure to watch over the
> > > entire
> > > > > > > > > performance of Flink code together. For example, if a
> > > regression
> > > > > > > > > occurs, investigating the cause and resolving the problem
> are
> > > > > needed.
> > > > > > > In
> > > > > > > > > the past, this procedure is maintained internally within
> > > > Ververica,
> > > > > > but
> > > > > > > > we
> > > > > > > > > think making the procedure public would benefit all. I
> > > volunteer
> > > > to
> > > > > > > serve
> > > > > > > > > as one of the initial maintainers, and would be glad if
> more
> > > > > > > contributors
> > > > > > > > > can join me. I'd also prepare some guidelines to help
> others
> > > get
> > > > > > > familiar
> > > > > > > > > with the workflow. I will start a new thread to discuss the
> > > > > workflow
> > > > > > > > soon.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > [1]
> > > > > https://www.mail-archive.com/[email protected]/msg58666.html
> > > > > > > > > [2] https://issues.apache.org/jira/browse/FLINK-28468
> > > > > > > > > [3]
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/flink-benchmarks/blob/master/regression_report.py
> > > > > > > > > [4] http://codespeed.dak8s.net:8080
> > > > > > > > > [5]
> > > > > https://lists.apache.org/thread/jzljp4233799vwwqnr0vc9wgqs0xj1ro
> > > > > > > > >
> > > > > > > > > [6] https://issues.apache.org/jira/browse/FLINK-18614
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Best,
> > > > > Hangxiang.
> > > > >
> > > >
> > >
> >
>

Re: [ANNOUNCE] Performance Daily Monitoring Moved from Ververica to Apache Flink Slack Channel

Reply via email to