Thomas, did you have a change to open a Jira for the streaming regression you observe? If not, could you please do so and cc +Ankur Goenka <goe...@google.com> ? I talked with Ankur offline and he is also interested in this regression.
I opened: - https://issues.apache.org/jira/browse/BEAM-8198 for batch regression. - https://issues.apache.org/jira/browse/BEAM-8199 to improve tooling around performance monitoring. - https://issues.apache.org/jira/browse/BEAM-8200 to add benchmarks for streaming. I cc'ed some folks, however not everyone. Manisha, I could not find your username in Jira, feel free to cc or assign BEAM-8199 <https://issues.apache.org/jira/browse/BEAM-8199> to yourself if that is something you are actively working on. Thanks, Valentyn On Mon, Sep 9, 2019 at 9:59 AM Mark Liu <mark...@google.com> wrote: > +Alan Myrvold <amyrv...@google.com> +Yifan Zou <yifan...@google.com> It >> would be good to have alerts on benchmarks. Do we have such an ability >> today? >> > > As for regression detection, we have a Jenkins job > beam_PerformanceTests_Analysis > <https://builds.apache.org/view/A-D/view/Beam/view/All/job/beam_PerformanceTests_Analysis/> > which > analyzes metrics on Bigquery and report a summary to job console output. > However, not all jobs are registered on this analyzer and currently no > further alerts integrated with it (e.g. email / slack). > > There are ongoing work to add alerting to benchmarks. Kasia and Kamil are > investigating on Prometheus + Grafana, and Manisha and me are looking into > mako.dev. > > Mark > > On Fri, Sep 6, 2019 at 7:21 PM Ahmet Altay <al...@google.com> wrote: > >> I agree, let's investigate. Thomas could you file JIRAs once you have >> additional information. >> >> Valentyn, I think the performance regression could be investigated now, >> by running whatever benchmarks that is available against 2.14, 2.15 and >> head and see if the same regression could be reproduced. >> >> On Fri, Sep 6, 2019 at 7:11 PM Valentyn Tymofieiev <valen...@google.com> >> wrote: >> >>> Sounds like these regressions need to be investigated ahead of 2.16.0 >>> release. >>> >>> On Fri, Sep 6, 2019 at 6:44 PM Thomas Weise <t...@apache.org> wrote: >>> >>>> >>>> >>>> On Fri, Sep 6, 2019 at 6:23 PM Ahmet Altay <al...@google.com> wrote: >>>> >>>>> >>>>> >>>>> On Fri, Sep 6, 2019 at 6:17 PM Thomas Weise <t...@apache.org> wrote: >>>>> >>>>>> >>>>>> >>>>>> On Fri, Sep 6, 2019 at 2:24 PM Valentyn Tymofieiev <valentyn@ >>>>>> google.com> wrote: >>>>>> >>>>>>> +Mark Liu <mark...@google.com> has added some benchmarks running >>>>>>> across multiple Python versions. Specifically we run 1 GB wordcount job >>>>>>> on >>>>>>> Dataflow runner on Python 2.7, 3.5-3.7. The benchmarks do not have >>>>>>> configured alerting and to my knowledge are not actively monitored yet. >>>>>>> >>>>>> >>>>>> Are there any benchmarks for streaming? Streaming and batch are quite >>>>>> different runtime paths. And some of the issues can only be >>>>>> identified with longer running processes through metrics. It would be >>>>>> good >>>>>> to verify utilization of memory, cpu etc. >>>>>> >>>>>> I additionally discovered that our 2.16 upgrade exhibits a memory >>>>>> leak in the Python worker (Py 2.7). >>>>>> >>>>> >>>>> Do you have more details on this one? >>>>> >>>> >>>> Unfortunately only that at the moment. The workers eat up all memory >>>> and eventually crash. Reverted back to 2.14 / Py 3.6 and the issue is gone. >>>> >>>> >>>>> >>>>> >>>>>> >>>>>> >>>>>>> Thomas, is it possible for you to do the bisection using SDK code >>>>>>> from master at various commits to narrow down the regression on your >>>>>>> end? >>>>>>> >>>>>> >>>>>> I don't know how soon I will get to it. It's of course possible, but >>>>>> expensive due to having to rebase the fork, build and deploy an >>>>>> entire stack of stuff for each iteration. The pipeline itself is super >>>>>> simple. We need this testbed as part of Beam. It would be nice to be able >>>>>> to pick an update and have more confidence that the baseline has not >>>>>> slipped. >>>>>> >>>>>> >>>>>>> >>>>>>> [1] >>>>>>> https://apache-beam-testing.appspot.com/explore?dashboard=5691127080419328 >>>>>>> [2] >>>>>>> https://drive.google.com/file/d/1ERlnN8bA2fKCUPBHTnid1l__81qpQe2W/view >>>>>>> [3] >>>>>>> https://github.com/apache/beam/commit/2d5e493abf39ee6fc89831bb0b7ec9fee592b9c5 >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Fri, Sep 6, 2019 at 8:38 AM Ahmet Altay <al...@google.com> wrote: >>>>>>> >>>>>>>> +Valentyn Tymofieiev <valen...@google.com> do we have benchmarks >>>>>>>> in different python versions? Was there a recent change that is >>>>>>>> specific to >>>>>>>> python 3.x ? >>>>>>>> >>>>>>>> On Fri, Sep 6, 2019 at 8:36 AM Thomas Weise <t...@apache.org> wrote: >>>>>>>> >>>>>>>>> The issue is only visible with Python 3.6, not 2.7. >>>>>>>>> >>>>>>>>> If there is a framework in place to add a streaming test, that >>>>>>>>> would be great. We would use what we have internally as starting >>>>>>>>> point. >>>>>>>>> >>>>>>>>> On Thu, Sep 5, 2019 at 5:00 PM Ahmet Altay <al...@google.com> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Thu, Sep 5, 2019 at 4:15 PM Thomas Weise <t...@apache.org> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> The workload is quite different. What I have is streaming with >>>>>>>>>>> state and timers. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Thu, Sep 5, 2019 at 3:47 PM Pablo Estrada <pabl...@google.com> >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>>> We only recently started running Chicago Taxi Example. +MichaĆ >>>>>>>>>>>> Walenia <michal.wale...@polidea.com> I don't see it in the >>>>>>>>>>>> dashboards. Do you know if it's possible to see any trends in the >>>>>>>>>>>> data? >>>>>>>>>>>> >>>>>>>>>>>> We have a few tests running now: >>>>>>>>>>>> - Combine tests: >>>>>>>>>>>> https://apache-beam-testing.appspot.com/explore?dashboard=5763764733345792&widget=201943890&container=1334074373 >>>>>>>>>>>> - GBK tests: >>>>>>>>>>>> https://apache-beam-testing.appspot.com/explore?dashboard=5763764733345792&widget=201943890&container=1334074373 >>>>>>>>>>>> >>>>>>>>>>>> They don't seem to show a very drastic jump either, but they >>>>>>>>>>>> aren't very old. >>>>>>>>>>>> >>>>>>>>>>>> There is also work ongoing to add alerting for this sort of >>>>>>>>>>>> regressions by Kasia and Kamil (added). The work is not there yet >>>>>>>>>>>> (it's in >>>>>>>>>>>> progress). >>>>>>>>>>>> Best >>>>>>>>>>>> -P. >>>>>>>>>>>> >>>>>>>>>>>> On Thu, Sep 5, 2019 at 3:35 PM Thomas Weise <t...@apache.org> >>>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> It probably won't be practical to do a bisect due to the high >>>>>>>>>>>>> cost of each iteration with our fork/deploy setup. >>>>>>>>>>>>> >>>>>>>>>>>>> Perhaps it is time to setup something with the synthetic >>>>>>>>>>>>> source that works just with Beam as dependency. >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>> I agree with this. >>>>>>>>>> >>>>>>>>>> Pablo, Kasia, Kamil, does the new benchmarks give us a easy to >>>>>>>>>> use framework for using synthetic source in benchmarks? >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>> On Thu, Sep 5, 2019 at 3:23 PM Ahmet Altay <al...@google.com> >>>>>>>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> There are a few in this dashboard [1], but not very useful in >>>>>>>>>>>>>> this case because they do not go back more than a month and not >>>>>>>>>>>>>> very >>>>>>>>>>>>>> comprehensive. I do not see a jump there. Thomas, would it be >>>>>>>>>>>>>> possible to >>>>>>>>>>>>>> bisect to find what commit caused the regression? >>>>>>>>>>>>>> >>>>>>>>>>>>>> +Pablo Estrada <pabl...@google.com> do we have any python on >>>>>>>>>>>>>> flink benchmarks for chicago example? >>>>>>>>>>>>>> +Alan Myrvold <amyrv...@google.com> +Yifan Zou >>>>>>>>>>>>>> <yifan...@google.com> It would be good to have alerts on >>>>>>>>>>>>>> benchmarks. Do we have such an ability today? >>>>>>>>>>>>>> >>>>>>>>>>>>>> [1] https://apache-beam-testing.appspot.com/dashboard-admin >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Thu, Sep 5, 2019 at 3:15 PM Thomas Weise <t...@apache.org> >>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Are there any performance tests run for the Python SDK as >>>>>>>>>>>>>>> part of release verification (or otherwise as well)? >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I see what appears to be a regression in master (compared to >>>>>>>>>>>>>>> 2.14) with our in-house application (~ 25% jump in cpu >>>>>>>>>>>>>>> utilization and >>>>>>>>>>>>>>> corresponds drop in throughput). >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I wanted to see if there is anything available to verify >>>>>>>>>>>>>>> that within Beam. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>> Thomas >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>