> > +Alan Myrvold <amyrv...@google.com> +Yifan Zou <yifan...@google.com> It > would be good to have alerts on benchmarks. Do we have such an ability > today? >
As for regression detection, we have a Jenkins job beam_PerformanceTests_Analysis <https://builds.apache.org/view/A-D/view/Beam/view/All/job/beam_PerformanceTests_Analysis/> which analyzes metrics on Bigquery and report a summary to job console output. However, not all jobs are registered on this analyzer and currently no further alerts integrated with it (e.g. email / slack). There are ongoing work to add alerting to benchmarks. Kasia and Kamil are investigating on Prometheus + Grafana, and Manisha and me are looking into mako.dev. Mark On Fri, Sep 6, 2019 at 7:21 PM Ahmet Altay <al...@google.com> wrote: > I agree, let's investigate. Thomas could you file JIRAs once you have > additional information. > > Valentyn, I think the performance regression could be investigated now, by > running whatever benchmarks that is available against 2.14, 2.15 and head > and see if the same regression could be reproduced. > > On Fri, Sep 6, 2019 at 7:11 PM Valentyn Tymofieiev <valen...@google.com> > wrote: > >> Sounds like these regressions need to be investigated ahead of 2.16.0 >> release. >> >> On Fri, Sep 6, 2019 at 6:44 PM Thomas Weise <t...@apache.org> wrote: >> >>> >>> >>> On Fri, Sep 6, 2019 at 6:23 PM Ahmet Altay <al...@google.com> wrote: >>> >>>> >>>> >>>> On Fri, Sep 6, 2019 at 6:17 PM Thomas Weise <t...@apache.org> wrote: >>>> >>>>> >>>>> >>>>> On Fri, Sep 6, 2019 at 2:24 PM Valentyn Tymofieiev <valentyn@ >>>>> google.com> wrote: >>>>> >>>>>> +Mark Liu <mark...@google.com> has added some benchmarks running >>>>>> across multiple Python versions. Specifically we run 1 GB wordcount job >>>>>> on >>>>>> Dataflow runner on Python 2.7, 3.5-3.7. The benchmarks do not have >>>>>> configured alerting and to my knowledge are not actively monitored yet. >>>>>> >>>>> >>>>> Are there any benchmarks for streaming? Streaming and batch are quite >>>>> different runtime paths. And some of the issues can only be >>>>> identified with longer running processes through metrics. It would be good >>>>> to verify utilization of memory, cpu etc. >>>>> >>>>> I additionally discovered that our 2.16 upgrade exhibits a memory leak >>>>> in the Python worker (Py 2.7). >>>>> >>>> >>>> Do you have more details on this one? >>>> >>> >>> Unfortunately only that at the moment. The workers eat up all memory and >>> eventually crash. Reverted back to 2.14 / Py 3.6 and the issue is gone. >>> >>> >>>> >>>> >>>>> >>>>> >>>>>> Thomas, is it possible for you to do the bisection using SDK code >>>>>> from master at various commits to narrow down the regression on your end? >>>>>> >>>>> >>>>> I don't know how soon I will get to it. It's of course possible, but >>>>> expensive due to having to rebase the fork, build and deploy an >>>>> entire stack of stuff for each iteration. The pipeline itself is super >>>>> simple. We need this testbed as part of Beam. It would be nice to be able >>>>> to pick an update and have more confidence that the baseline has not >>>>> slipped. >>>>> >>>>> >>>>>> >>>>>> [1] >>>>>> https://apache-beam-testing.appspot.com/explore?dashboard=5691127080419328 >>>>>> [2] >>>>>> https://drive.google.com/file/d/1ERlnN8bA2fKCUPBHTnid1l__81qpQe2W/view >>>>>> [3] >>>>>> https://github.com/apache/beam/commit/2d5e493abf39ee6fc89831bb0b7ec9fee592b9c5 >>>>>> >>>>>> >>>>>> >>>>>> On Fri, Sep 6, 2019 at 8:38 AM Ahmet Altay <al...@google.com> wrote: >>>>>> >>>>>>> +Valentyn Tymofieiev <valen...@google.com> do we have benchmarks in >>>>>>> different python versions? Was there a recent change that is specific to >>>>>>> python 3.x ? >>>>>>> >>>>>>> On Fri, Sep 6, 2019 at 8:36 AM Thomas Weise <t...@apache.org> wrote: >>>>>>> >>>>>>>> The issue is only visible with Python 3.6, not 2.7. >>>>>>>> >>>>>>>> If there is a framework in place to add a streaming test, that >>>>>>>> would be great. We would use what we have internally as starting point. >>>>>>>> >>>>>>>> On Thu, Sep 5, 2019 at 5:00 PM Ahmet Altay <al...@google.com> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Thu, Sep 5, 2019 at 4:15 PM Thomas Weise <t...@apache.org> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> The workload is quite different. What I have is streaming with >>>>>>>>>> state and timers. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Thu, Sep 5, 2019 at 3:47 PM Pablo Estrada <pabl...@google.com> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> We only recently started running Chicago Taxi Example. +MichaĆ >>>>>>>>>>> Walenia <michal.wale...@polidea.com> I don't see it in the >>>>>>>>>>> dashboards. Do you know if it's possible to see any trends in the >>>>>>>>>>> data? >>>>>>>>>>> >>>>>>>>>>> We have a few tests running now: >>>>>>>>>>> - Combine tests: >>>>>>>>>>> https://apache-beam-testing.appspot.com/explore?dashboard=5763764733345792&widget=201943890&container=1334074373 >>>>>>>>>>> - GBK tests: >>>>>>>>>>> https://apache-beam-testing.appspot.com/explore?dashboard=5763764733345792&widget=201943890&container=1334074373 >>>>>>>>>>> >>>>>>>>>>> They don't seem to show a very drastic jump either, but they >>>>>>>>>>> aren't very old. >>>>>>>>>>> >>>>>>>>>>> There is also work ongoing to add alerting for this sort of >>>>>>>>>>> regressions by Kasia and Kamil (added). The work is not there yet >>>>>>>>>>> (it's in >>>>>>>>>>> progress). >>>>>>>>>>> Best >>>>>>>>>>> -P. >>>>>>>>>>> >>>>>>>>>>> On Thu, Sep 5, 2019 at 3:35 PM Thomas Weise <t...@apache.org> >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>>> It probably won't be practical to do a bisect due to the high >>>>>>>>>>>> cost of each iteration with our fork/deploy setup. >>>>>>>>>>>> >>>>>>>>>>>> Perhaps it is time to setup something with the synthetic source >>>>>>>>>>>> that works just with Beam as dependency. >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>> I agree with this. >>>>>>>>> >>>>>>>>> Pablo, Kasia, Kamil, does the new benchmarks give us a easy to use >>>>>>>>> framework for using synthetic source in benchmarks? >>>>>>>>> >>>>>>>>> >>>>>>>>>> >>>>>>>>>>>> On Thu, Sep 5, 2019 at 3:23 PM Ahmet Altay <al...@google.com> >>>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> There are a few in this dashboard [1], but not very useful in >>>>>>>>>>>>> this case because they do not go back more than a month and not >>>>>>>>>>>>> very >>>>>>>>>>>>> comprehensive. I do not see a jump there. Thomas, would it be >>>>>>>>>>>>> possible to >>>>>>>>>>>>> bisect to find what commit caused the regression? >>>>>>>>>>>>> >>>>>>>>>>>>> +Pablo Estrada <pabl...@google.com> do we have any python on >>>>>>>>>>>>> flink benchmarks for chicago example? >>>>>>>>>>>>> +Alan Myrvold <amyrv...@google.com> +Yifan Zou >>>>>>>>>>>>> <yifan...@google.com> It would be good to have alerts on >>>>>>>>>>>>> benchmarks. Do we have such an ability today? >>>>>>>>>>>>> >>>>>>>>>>>>> [1] https://apache-beam-testing.appspot.com/dashboard-admin >>>>>>>>>>>>> >>>>>>>>>>>>> On Thu, Sep 5, 2019 at 3:15 PM Thomas Weise <t...@apache.org> >>>>>>>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>> >>>>>>>>>>>>>> Are there any performance tests run for the Python SDK as >>>>>>>>>>>>>> part of release verification (or otherwise as well)? >>>>>>>>>>>>>> >>>>>>>>>>>>>> I see what appears to be a regression in master (compared to >>>>>>>>>>>>>> 2.14) with our in-house application (~ 25% jump in cpu >>>>>>>>>>>>>> utilization and >>>>>>>>>>>>>> corresponds drop in throughput). >>>>>>>>>>>>>> >>>>>>>>>>>>>> I wanted to see if there is anything available to verify that >>>>>>>>>>>>>> within Beam. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>> Thomas >>>>>>>>>>>>>> >>>>>>>>>>>>>>