Hi all, Thanks for the reminder.
@Matthias any updates on the performance tests? ...or more specifically, any updates on the script for alerting on performance regressions? I create a PR for FLINK-27571[1] but it's still under review, would you like to help take a look? FLINK-27571 is just for the new benchmarks, for the old existing benchmarks, their information is stored in codespeed's database which can't be updated by URL request, so I also logged into the Jenkins master and modified the codespeed's database, currently "less is better" can be displayed normally on the timeline[2]. Does it make sense to formalize/document the process? Certainly, I'm preparing a draft to share my experience of finding commits that caused regressions. Originally, I wanted to wait for FLINK-27571 to be merged before starting a discussion, and I will put a draft of the document later. This slack channel can only provide notice of regression and some experience on how to locate regression, but we also need some people to take action after the regression happens. It is mainly a few people who volunteer to do these things, like FLINK-30015[3] and FLINK-30623[4], many thanks for Martijn's contribution. As for whether to add the responsibilities to the release manager, I think it needs to see other people's opinions. @Martijn Thanks for creating these tickets. For FLINK-30623 and FLINK-30624[5], @Hangxiang and I have located the corresponding commit and pinged the corresponding submitter. Regression may not be avoided, I totally do agree that this work needs to be formalized as soon as possible to fix regressions. [1] https://issues.apache.org/jira/browse/FLINK-27571 [2] http://codespeed.dak8s.net:8000/timeline/#/?ben=createScheduler.BATCH&extr=on&quarts=on&equid=off&env=2&revs=200&exe=1,3,5,6,8,9 [3] https://issues.apache.org/jira/browse/FLINK-30015 [4] https://issues.apache.org/jira/browse/FLINK-30623 [5] https://issues.apache.org/jira/browse/FLINK-30624 Best regards, Yanfei Martijn Visser <martijnvis...@apache.org> 于2023年1月11日周三 01:11写道: > Hi all, > > Related to Matthias' email, I've checked the notifications in the Slack > channel and noticed three major benchmark regressions. In the end, I've > decided to create Jira tickets for it [1] [2] [3] but I do agree that this > work needs to be formalized as soon as possible to avoid regressions. It > would also be great to include a process on how these regressions will be > fixed, because I have no idea who to ping/notify that these regressions > have occurred. > > Best regards, > > Martijn > > [1] https://issues.apache.org/jira/browse/FLINK-30623 > [2] https://issues.apache.org/jira/browse/FLINK-30624 > [3] https://issues.apache.org/jira/browse/FLINK-30625 > > On Tue, Jan 10, 2023 at 1:56 PM Matthias Pohl > <matthias.p...@aiven.io.invalid> wrote: > > > Hi Yanfei, > > any updates on the performance tests? ...or more specifically, any > updates > > on the script for alerting on performance regressions? > > > > Does it make sense to formalize/document the process? Currently, the > > release management doesn't do anything in terms of performance > > test monitoring. Therefore, performance regressions are not necessarily > > identified actively (in contrast to CI instabilities). Or is this covered > > by the PMC? It would be interesting to know whether there's someone to > > reach out to who's monitoring the regression tests regularly. Would it > make > > sense for this person to join the release calls? > > > > Or shall we work on formalizing/documenting the process and integrating > > this responsibility into what the release manager(s) are in charge of? My > > concern with that approach is that contributors might be less willing to > > volunteer in the release management if we collect everything in one role. > > Alternatively, we could split the release manager role up into sub-roles > > that contributors can volunteer for in a release (e.g. CI monitoring, > > performance test monitoring, Jira maintenance, ... just coming up with > > random tasks here). > > > > Alternatively, we could leave everything as is and just respond if > there's > > some complaint. I'm curious about your (and other's) opinions. > > > > Matthias > > > > On Tue, Nov 29, 2022 at 2:13 PM Yanfei Lei <fredia...@gmail.com> wrote: > > > > > Hi Martijn, > > > > > > Thanks for bringing this up. > > > > > > In the past two months, this channel has helped us find many benchmark > > fail > > > issues, like FLINK-29883 > > > <https://issues.apache.org/jira/browse/FLINK-29883>[1], > > > FLINK-29886 <https://issues.apache.org/jira/browse/FLINK-29886>[2], > > > FLINK-30015 <https://issues.apache.org/jira/browse/FLINK-30015>[3] and > > > FLINK-30181 <https://issues.apache.org/jira/browse/FLINK-30181>[4]. I > > also > > > have tried investigating several of the frequently reported regressions > > and > > > replied under the notification in slack channel(copy them here): > > > > > > 1. serializerHeavyString > > > < > > > > > > http://codespeed.dak8s.net:8000/timeline/#/?exe=6&ben=serializerHeavyString&extr=on&quarts=on&equid=off&env=2&revs=200 > > > >: > > > It is unstable for a long time, see [5] > > > https://issues.apache.org/jira/browse/FLINK-27165 for possible > > reasons. > > > 2. Regressions are detected by a simple script which may have false > > > positives and false negatives, especially for benchmarks with small > > > absolute values, small value changes cause large percentage changes. > > see > > > [6] for details. > > > > > > Maybe slidingWindow > > > < > > > > > > http://codespeed.dak8s.net:8000/timeline/#/?exe=6&ben=slidingWindow&extr=on&quarts=on&equid=off&env=2&revs=200 > > > >(value~=600), > > > stateBackends.ROCKS > > > < > > > > > > http://codespeed.dak8s.net:8000/timeline/#/?exe=6&ben=stateBackends.ROCKS&extr=on&quarts=on&equid=off&env=2&revs=200 > > > > > > > (value~=260) and serializerHeavyString > > > < > > > > > > http://codespeed.dak8s.net:8000/timeline/#/?exe=6&ben=serializerHeavyString&extr=on&quarts=on&equid=off&env=2&revs=200 > > > >(value~=170) > > > are > > > not true regressions. > > > > > > 1. For deployAllTasks.STREAMING > > > < > > > > > > http://codespeed.dak8s.net:8000/timeline/#/?exe=8&ben=deployAllTasks.STREAMING&extr=on&quarts=on&equid=off&env=2&revs=200 > > > >, > > > this benchmark result is how much time it takes to deploy job, the > > less > > > value the better performance, see [7] for details. FLINK-27571 > > > <https://issues.apache.org/jira/browse/FLINK-27571>[8] would fix > this > > > problem. > > > > > > > > > As mentioned before, regressions are detected by a simple script that > is > > > less stable, FLINK-29825 < > > > https://issues.apache.org/jira/browse/FLINK-29825>[9] > > > is created to improve the benchmark's stability. I planned to invite > more > > > volunteers to monitor it after the checking of regression became more > > > stable, but I've been stuck with something else lately, sorry for the > > late > > > response. Any suggestions on handling benchmark regressions/fails are > > > welcome. > > > > > > [1] https://issues.apache.org/jira/browse/FLINK-29883 > > > > > > [2] https://issues.apache.org/jira/browse/FLINK-29886 > > > > > > [3] https://issues.apache.org/jira/browse/FLINK-30015 > > > > > > [4] https://issues.apache.org/jira/browse/FLINK-30181 > > > > > > [5] https://issues.apache.org/jira/browse/FLINK-27165 > > > > > > [6] > > > > > > > > > https://github.com/apache/flink-benchmarks/blob/master/regression_report.py#L132-L136 > > > > > > [7] > > > > > > > > > https://github.com/apache/flink-benchmarks/blob/master/src/main/java/org/apache/flink/scheduler/benchmark/deploying/DeployingTasksInStreamingJobBenchmarkExecutor.java#L58 > > > > > > [8] https://issues.apache.org/jira/browse/FLINK-27571 > > > > > > [9] https://issues.apache.org/jira/browse/FLINK-29825 > > > > > > > > > Best, > > > > > > Yanfei > > > > > > Martijn Visser <martijnvis...@apache.org> 于2022年11月29日周二 15:54写道: > > > > > > > Hi, > > > > > > > > Is there any update to be expected on the benchmark? I see results of > > the > > > > benchmark being posted to Slack, but it appears that it's not being > > > > monitored and no follow-up actions are being taken. I think it's > > > currently > > > > lacking a process on how to interpret the results and what action > > should > > > > be taken and by whom. > > > > > > > > Best regards, > > > > > > > > Martijn > > > > > > > > On Thu, Nov 3, 2022 at 12:22 PM Jing Ge <j...@ververica.com> wrote: > > > > > > > > > Thanks yanfei for driving this! > > > > > > > > > > Looking forward to further discussion w.r.t. the workflow. > > > > > > > > > > Best regards, > > > > > Jing > > > > > > > > > > On Mon, Oct 31, 2022 at 6:04 PM Mason Chen <mas.chen6...@gmail.com > > > > > > wrote: > > > > > > > > > > > +1, thanks for driving this! > > > > > > > > > > > > On a side note, can we also ensure that a performance summary > > report > > > > for > > > > > > Flink major version upgrades is in release notes, once this > > > > > infrastructure > > > > > > becomes mature? From the user perspective, it would be nice to > know > > > > what > > > > > > the expected (or unexpected) regressions in a major version > upgrade > > > > are. > > > > > > I've seen the community do something like this before (e.g. the > > major > > > > > > rocksdb version bump in 1.14?) and it was quite valuable to know > > that > > > > > > upfront! > > > > > > > > > > > > Best, > > > > > > Mason > > > > > > > > > > > > On Fri, Oct 28, 2022 at 1:46 AM weijie guo < > > > guoweijieres...@gmail.com> > > > > > > wrote: > > > > > > > > > > > > > Thanks Yanfei for driving this. > > > > > > > > > > > > > > It allows us to easily find the problem of performance > > regression. > > > > > > > Especially recently, I have made some improvements to the > > > scheduling > > > > > > > related parts, your work is very important to ensure that these > > > > changes > > > > > > do > > > > > > > not cause some unexpected problems. > > > > > > > > > > > > > > Best regards, > > > > > > > > > > > > > > Weijie > > > > > > > > > > > > > > > > > > > > > Congxian Qiu <qcx978132...@gmail.com> 于2022年10月28日周五 16:03写道: > > > > > > > > > > > > > > > Thanks for driving this and making the performance monitoring > > > > public, > > > > > > > this > > > > > > > > can make us know and resolve the performance problem quickly. > > > > > > > > > > > > > > > > Looking forward to the workflow and detailed descriptions fo > > > > > > > > flink-dev-benchmarks. > > > > > > > > > > > > > > > > Best, > > > > > > > > Congxian > > > > > > > > > > > > > > > > > > > > > > > > Yun Tang <myas...@live.com> 于2022年10月27日周四 12:41写道: > > > > > > > > > > > > > > > > > Thanks, Yanfei for driving this to monitor the performance > in > > > the > > > > > > > Apache > > > > > > > > > Flink Slack Channel. > > > > > > > > > > > > > > > > > > Look forward to the workflow and detailed descriptions of > > > > > > > > > flink-dev-benchmarks. > > > > > > > > > > > > > > > > > > Best > > > > > > > > > Yun Tang > > > > > > > > > ________________________________ > > > > > > > > > From: Hangxiang Yu <master...@gmail.com> > > > > > > > > > Sent: Thursday, October 27, 2022 10:59 > > > > > > > > > To: dev@flink.apache.org <dev@flink.apache.org> > > > > > > > > > Subject: Re: [ANNOUNCE] Performance Daily Monitoring Moved > > from > > > > > > > Ververica > > > > > > > > > to Apache Flink Slack Channel > > > > > > > > > > > > > > > > > > Hi, Yanfei. > > > > > > > > > Thanks for driving this. > > > > > > > > > It could help us to detect and resolve the regression > problem > > > > > quickly > > > > > > > and > > > > > > > > > officially. > > > > > > > > > I'd like to join as a maintainer. > > > > > > > > > Looking forward to the workflow. > > > > > > > > > > > > > > > > > > On Wed, Oct 26, 2022 at 5:18 PM Yuan Mei < > > > yuanmei.w...@gmail.com > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > Thanks, Yanfei, to drive this and make the performance > > > > monitoring > > > > > > > > > publicly > > > > > > > > > > available. > > > > > > > > > > > > > > > > > > > > Looking forward to seeing the workflow, and more details > as > > > > > Martijn > > > > > > > > > > mentioned. > > > > > > > > > > > > > > > > > > > > Best > > > > > > > > > > Yuan > > > > > > > > > > > > > > > > > > > > On Wed, Oct 26, 2022 at 2:59 PM Martijn Visser < > > > > > > > > martijnvis...@apache.org > > > > > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > Hi Yanfei Lei, > > > > > > > > > > > > > > > > > > > > > > Thanks for setting this up! It would be interesting to > > also > > > > > know > > > > > > > > which > > > > > > > > > > > aspects of Flink are monitored for "performance". I'm > > > > assuming > > > > > > > there > > > > > > > > > are > > > > > > > > > > > specific pieces of functionality that are performance > > > tested, > > > > > but > > > > > > > it > > > > > > > > > > would > > > > > > > > > > > be great if this would be written down somewhere (next > > to a > > > > > > > procedure > > > > > > > > > how > > > > > > > > > > > to detect a regression and what should be next steps). > > > > > > > > > > > > > > > > > > > > > > Best regards, > > > > > > > > > > > > > > > > > > > > > > Martijn > > > > > > > > > > > > > > > > > > > > > > On Wed, Oct 26, 2022 at 8:21 AM Zakelly Lan < > > > > > > zakelly....@gmail.com > > > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > Hi yanfei, > > > > > > > > > > > > > > > > > > > > > > > > Thanks for driving this! It's a great help. > > > > > > > > > > > > > > > > > > > > > > > > I would like to join as a maintainer. > > > > > > > > > > > > > > > > > > > > > > > > Best, > > > > > > > > > > > > Zakelly > > > > > > > > > > > > > > > > > > > > > > > > On Wed, Oct 26, 2022 at 11:32 AM yanfei lei < > > > > > > fredia...@gmail.com > > > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > Hi everyone, > > > > > > > > > > > > > > > > > > > > > > > > > > As discussed earlier, we plan to create a benchmark > > > > channel > > > > > > in > > > > > > > > > Apache > > > > > > > > > > > > Flink > > > > > > > > > > > > > slack[1], but the plan was shelved for a while[2]. > > So I > > > > > went > > > > > > on > > > > > > > > > with > > > > > > > > > > > this > > > > > > > > > > > > > work, and created the #flink-dev-benchmarks channel > > for > > > > > > > > performance > > > > > > > > > > > > > regression notifications. > > > > > > > > > > > > > > > > > > > > > > > > > > We have a regression report script[3] that runs > > daily, > > > > and > > > > > a > > > > > > > > > > > notification > > > > > > > > > > > > > would be sent to the slack channel when the last > few > > > > > > benchmark > > > > > > > > > > results > > > > > > > > > > > > are > > > > > > > > > > > > > significantly worse than the baseline. > > > > > > > > > > > > > Note, regressions are detected by a simple script > > which > > > > may > > > > > > > have > > > > > > > > > > false > > > > > > > > > > > > > positives and false negatives. And all benchmarks > are > > > > > > executed > > > > > > > on > > > > > > > > > one > > > > > > > > > > > > > physical machine[4] which is provided by > > > > > > Ververica(Alibaba)[5], > > > > > > > > it > > > > > > > > > > > might > > > > > > > > > > > > > happen that hardware issues affect performance, > like > > > > > > > > "[FLINK-18614 > > > > > > > > > > > > > <https://issues.apache.org/jira/browse/FLINK-18614 > >] > > > > > > > Performance > > > > > > > > > > > > regression > > > > > > > > > > > > > 2020.07.13"[6]. > > > > > > > > > > > > > > > > > > > > > > > > > > After the migration, we need a procedure to watch > > over > > > > the > > > > > > > entire > > > > > > > > > > > > > performance of Flink code together. For example, > if a > > > > > > > regression > > > > > > > > > > > > > occurs, investigating the cause and resolving the > > > problem > > > > > are > > > > > > > > > needed. > > > > > > > > > > > In > > > > > > > > > > > > > the past, this procedure is maintained internally > > > within > > > > > > > > Ververica, > > > > > > > > > > but > > > > > > > > > > > > we > > > > > > > > > > > > > think making the procedure public would benefit > all. > > I > > > > > > > volunteer > > > > > > > > to > > > > > > > > > > > serve > > > > > > > > > > > > > as one of the initial maintainers, and would be > glad > > if > > > > > more > > > > > > > > > > > contributors > > > > > > > > > > > > > can join me. I'd also prepare some guidelines to > help > > > > > others > > > > > > > get > > > > > > > > > > > familiar > > > > > > > > > > > > > with the workflow. I will start a new thread to > > discuss > > > > the > > > > > > > > > workflow > > > > > > > > > > > > soon. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > [1] > > > > > > > > > > > > https://www.mail-archive.com/dev@flink.apache.org/msg58666.html > > > > > > > > > > > > > [2] > > https://issues.apache.org/jira/browse/FLINK-28468 > > > > > > > > > > > > > [3] > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://github.com/apache/flink-benchmarks/blob/master/regression_report.py > > > > > > > > > > > > > [4] http://codespeed.dak8s.net:8080 > > > > > > > > > > > > > [5] > > > > > > > > > > > > https://lists.apache.org/thread/jzljp4233799vwwqnr0vc9wgqs0xj1ro > > > > > > > > > > > > > > > > > > > > > > > > > > [6] > > https://issues.apache.org/jira/browse/FLINK-18614 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > Best, > > > > > > > > > Hangxiang. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >