Re: Possible Python SDK performance regression

Mark Liu Mon, 09 Sep 2019 09:59:39 -0700

>
> +Alan Myrvold <amyrv...@google.com> +Yifan Zou <yifan...@google.com> It
> would be good to have alerts on benchmarks. Do we have such an ability
> today?
>


As for regression detection, we have a Jenkins job
beam_PerformanceTests_Analysis
<https://builds.apache.org/view/A-D/view/Beam/view/All/job/beam_PerformanceTests_Analysis/>
which
analyzes metrics on Bigquery and report a summary to job console output.
However, not all jobs are registered on this analyzer and currently no
further alerts integrated with it (e.g. email / slack).

There are ongoing work to add alerting to benchmarks. Kasia and Kamil are
investigating on Prometheus + Grafana, and Manisha and me are looking into
mako.dev.

Mark

On Fri, Sep 6, 2019 at 7:21 PM Ahmet Altay <al...@google.com> wrote:

> I agree, let's investigate. Thomas could you file JIRAs once you have
> additional information.
>
> Valentyn, I think the performance regression could be investigated now, by
> running whatever benchmarks that is available against 2.14, 2.15 and head
> and see if the same regression could be reproduced.
>
> On Fri, Sep 6, 2019 at 7:11 PM Valentyn Tymofieiev <valen...@google.com>
> wrote:
>
>> Sounds like these regressions need to be investigated ahead of 2.16.0
>> release.
>>
>> On Fri, Sep 6, 2019 at 6:44 PM Thomas Weise <t...@apache.org> wrote:
>>
>>>
>>>
>>> On Fri, Sep 6, 2019 at 6:23 PM Ahmet Altay <al...@google.com> wrote:
>>>
>>>>
>>>>
>>>> On Fri, Sep 6, 2019 at 6:17 PM Thomas Weise <t...@apache.org> wrote:
>>>>
>>>>>
>>>>>
>>>>> On Fri, Sep 6, 2019 at 2:24 PM Valentyn Tymofieiev <valentyn@
>>>>> google.com> wrote:
>>>>>
>>>>>> +Mark Liu <mark...@google.com> has added some benchmarks running
>>>>>> across multiple Python versions. Specifically we run 1 GB wordcount job 
>>>>>> on
>>>>>> Dataflow runner on Python 2.7, 3.5-3.7. The benchmarks do not have
>>>>>> configured alerting and to my knowledge are not actively monitored yet.
>>>>>>
>>>>>
>>>>> Are there any benchmarks for streaming? Streaming and batch are quite
>>>>> different runtime paths. And some of the issues can only be
>>>>> identified with longer running processes through metrics. It would be good
>>>>> to verify utilization of memory, cpu etc.
>>>>>
>>>>> I additionally discovered that our 2.16 upgrade exhibits a memory leak
>>>>> in the Python worker (Py 2.7).
>>>>>
>>>>
>>>> Do you have more details on this one?
>>>>
>>>
>>> Unfortunately only that at the moment. The workers eat up all memory and
>>> eventually crash. Reverted back to 2.14 / Py 3.6 and the issue is gone.
>>>
>>>
>>>>
>>>>
>>>>>
>>>>>
>>>>>> Thomas, is it possible for you to do the bisection using SDK code
>>>>>> from master at various commits to narrow down the regression on your end?
>>>>>>
>>>>>
>>>>> I don't know how soon I will get to it. It's of course possible, but
>>>>> expensive due to having to rebase the fork, build and deploy an
>>>>> entire stack of stuff for each iteration. The pipeline itself is super
>>>>> simple. We need this testbed as part of Beam. It would be nice to be able
>>>>> to pick an update and have more confidence that the baseline has not
>>>>> slipped.
>>>>>
>>>>>
>>>>>>
>>>>>> [1]
>>>>>> https://apache-beam-testing.appspot.com/explore?dashboard=5691127080419328
>>>>>> [2]
>>>>>> https://drive.google.com/file/d/1ERlnN8bA2fKCUPBHTnid1l__81qpQe2W/view
>>>>>> [3]
>>>>>> https://github.com/apache/beam/commit/2d5e493abf39ee6fc89831bb0b7ec9fee592b9c5
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Fri, Sep 6, 2019 at 8:38 AM Ahmet Altay <al...@google.com> wrote:
>>>>>>
>>>>>>> +Valentyn Tymofieiev <valen...@google.com> do we have benchmarks in
>>>>>>> different python versions? Was there a recent change that is specific to
>>>>>>> python 3.x ?
>>>>>>>
>>>>>>> On Fri, Sep 6, 2019 at 8:36 AM Thomas Weise <t...@apache.org> wrote:
>>>>>>>
>>>>>>>> The issue is only visible with Python 3.6, not 2.7.
>>>>>>>>
>>>>>>>> If there is a framework in place to add a streaming test, that
>>>>>>>> would be great. We would use what we have internally as starting point.
>>>>>>>>
>>>>>>>> On Thu, Sep 5, 2019 at 5:00 PM Ahmet Altay <al...@google.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Thu, Sep 5, 2019 at 4:15 PM Thomas Weise <t...@apache.org>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> The workload is quite different. What I have is streaming with
>>>>>>>>>> state and timers.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Thu, Sep 5, 2019 at 3:47 PM Pablo Estrada <pabl...@google.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> We only recently started running Chicago Taxi Example. +Michał
>>>>>>>>>>> Walenia <michal.wale...@polidea.com> I don't see it in the
>>>>>>>>>>> dashboards. Do you know if it's possible to see any trends in the 
>>>>>>>>>>> data?
>>>>>>>>>>>
>>>>>>>>>>> We have a few tests running now:
>>>>>>>>>>> - Combine tests:
>>>>>>>>>>> https://apache-beam-testing.appspot.com/explore?dashboard=5763764733345792&widget=201943890&container=1334074373
>>>>>>>>>>> - GBK tests:
>>>>>>>>>>> https://apache-beam-testing.appspot.com/explore?dashboard=5763764733345792&widget=201943890&container=1334074373
>>>>>>>>>>>
>>>>>>>>>>> They don't seem to show a very drastic jump either, but they
>>>>>>>>>>> aren't very old.
>>>>>>>>>>>
>>>>>>>>>>> There is also work ongoing to add alerting for this sort of
>>>>>>>>>>> regressions by Kasia and Kamil (added). The work is not there yet 
>>>>>>>>>>> (it's in
>>>>>>>>>>> progress).
>>>>>>>>>>> Best
>>>>>>>>>>> -P.
>>>>>>>>>>>
>>>>>>>>>>> On Thu, Sep 5, 2019 at 3:35 PM Thomas Weise <t...@apache.org>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> It probably won't be practical to do a bisect due to the high
>>>>>>>>>>>> cost of each iteration with our fork/deploy setup.
>>>>>>>>>>>>
>>>>>>>>>>>> Perhaps it is time to setup something with the synthetic source
>>>>>>>>>>>> that works just with Beam as dependency.
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>> I agree with this.
>>>>>>>>>
>>>>>>>>> Pablo, Kasia, Kamil, does the new benchmarks give us a easy to use
>>>>>>>>> framework for using synthetic source in benchmarks?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>> On Thu, Sep 5, 2019 at 3:23 PM Ahmet Altay <al...@google.com>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> There are a few in this dashboard [1], but not very useful in
>>>>>>>>>>>>> this case because they do not go back more than a month and not 
>>>>>>>>>>>>> very
>>>>>>>>>>>>> comprehensive. I do not see a jump there. Thomas, would it be 
>>>>>>>>>>>>> possible to
>>>>>>>>>>>>> bisect to find what commit caused the regression?
>>>>>>>>>>>>>
>>>>>>>>>>>>> +Pablo Estrada <pabl...@google.com> do we have any python on
>>>>>>>>>>>>> flink benchmarks for chicago example?
>>>>>>>>>>>>> +Alan Myrvold <amyrv...@google.com> +Yifan Zou
>>>>>>>>>>>>> <yifan...@google.com> It would be good to have alerts on
>>>>>>>>>>>>> benchmarks. Do we have such an ability today?
>>>>>>>>>>>>>
>>>>>>>>>>>>> [1] https://apache-beam-testing.appspot.com/dashboard-admin
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Thu, Sep 5, 2019 at 3:15 PM Thomas Weise <t...@apache.org>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Are there any performance tests run for the Python SDK as
>>>>>>>>>>>>>> part of release verification (or otherwise as well)?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I see what appears to be a regression in master (compared to
>>>>>>>>>>>>>> 2.14) with our in-house application (~ 25% jump in cpu 
>>>>>>>>>>>>>> utilization and
>>>>>>>>>>>>>> corresponds drop in throughput).
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I wanted to see if there is anything available to verify that
>>>>>>>>>>>>>> within Beam.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>> Thomas
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>

Re: Possible Python SDK performance regression

Reply via email to