Re: Possible Python SDK performance regression

Valentyn Tymofieiev Tue, 10 Sep 2019 13:54:36 -0700

Thomas, did you have a change to open a Jira for the streaming regression
you observe? If not, could you please do so and cc +Ankur Goenka
<[email protected]> ? I talked with Ankur offline and he is also interested
in this regression.


I opened:
- https://issues.apache.org/jira/browse/BEAM-8198 for batch regression.
- https://issues.apache.org/jira/browse/BEAM-8199 to improve tooling around
performance monitoring.
- https://issues.apache.org/jira/browse/BEAM-8200 to add benchmarks for
streaming.

I cc'ed some folks, however not everyone. Manisha, I could not find your
username in Jira, feel free to cc or assign BEAM-8199
<https://issues.apache.org/jira/browse/BEAM-8199>  to yourself if that is
something you are actively working on.

Thanks,
Valentyn

On Mon, Sep 9, 2019 at 9:59 AM Mark Liu <[email protected]> wrote:

> +Alan Myrvold <[email protected]> +Yifan Zou <[email protected]> It
>> would be good to have alerts on benchmarks. Do we have such an ability
>> today?
>>
>
> As for regression detection, we have a Jenkins job
> beam_PerformanceTests_Analysis
> <https://builds.apache.org/view/A-D/view/Beam/view/All/job/beam_PerformanceTests_Analysis/>
>  which
> analyzes metrics on Bigquery and report a summary to job console output.
> However, not all jobs are registered on this analyzer and currently no
> further alerts integrated with it (e.g. email / slack).
>
> There are ongoing work to add alerting to benchmarks. Kasia and Kamil are
> investigating on Prometheus + Grafana, and Manisha and me are looking into
> mako.dev.
>
> Mark
>
> On Fri, Sep 6, 2019 at 7:21 PM Ahmet Altay <[email protected]> wrote:
>
>> I agree, let's investigate. Thomas could you file JIRAs once you have
>> additional information.
>>
>> Valentyn, I think the performance regression could be investigated now,
>> by running whatever benchmarks that is available against 2.14, 2.15 and
>> head and see if the same regression could be reproduced.
>>
>> On Fri, Sep 6, 2019 at 7:11 PM Valentyn Tymofieiev <[email protected]>
>> wrote:
>>
>>> Sounds like these regressions need to be investigated ahead of 2.16.0
>>> release.
>>>
>>> On Fri, Sep 6, 2019 at 6:44 PM Thomas Weise <[email protected]> wrote:
>>>
>>>>
>>>>
>>>> On Fri, Sep 6, 2019 at 6:23 PM Ahmet Altay <[email protected]> wrote:
>>>>
>>>>>
>>>>>
>>>>> On Fri, Sep 6, 2019 at 6:17 PM Thomas Weise <[email protected]> wrote:
>>>>>
>>>>>>
>>>>>>
>>>>>> On Fri, Sep 6, 2019 at 2:24 PM Valentyn Tymofieiev <valentyn@
>>>>>> google.com> wrote:
>>>>>>
>>>>>>> +Mark Liu <[email protected]> has added some benchmarks running
>>>>>>> across multiple Python versions. Specifically we run 1 GB wordcount job 
>>>>>>> on
>>>>>>> Dataflow runner on Python 2.7, 3.5-3.7. The benchmarks do not have
>>>>>>> configured alerting and to my knowledge are not actively monitored yet.
>>>>>>>
>>>>>>
>>>>>> Are there any benchmarks for streaming? Streaming and batch are quite
>>>>>> different runtime paths. And some of the issues can only be
>>>>>> identified with longer running processes through metrics. It would be 
>>>>>> good
>>>>>> to verify utilization of memory, cpu etc.
>>>>>>
>>>>>> I additionally discovered that our 2.16 upgrade exhibits a memory
>>>>>> leak in the Python worker (Py 2.7).
>>>>>>
>>>>>
>>>>> Do you have more details on this one?
>>>>>
>>>>
>>>> Unfortunately only that at the moment. The workers eat up all memory
>>>> and eventually crash. Reverted back to 2.14 / Py 3.6 and the issue is gone.
>>>>
>>>>
>>>>>
>>>>>
>>>>>>
>>>>>>
>>>>>>> Thomas, is it possible for you to do the bisection using SDK code
>>>>>>> from master at various commits to narrow down the regression on your 
>>>>>>> end?
>>>>>>>
>>>>>>
>>>>>> I don't know how soon I will get to it. It's of course possible, but
>>>>>> expensive due to having to rebase the fork, build and deploy an
>>>>>> entire stack of stuff for each iteration. The pipeline itself is super
>>>>>> simple. We need this testbed as part of Beam. It would be nice to be able
>>>>>> to pick an update and have more confidence that the baseline has not
>>>>>> slipped.
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> [1]
>>>>>>> https://apache-beam-testing.appspot.com/explore?dashboard=5691127080419328
>>>>>>> [2]
>>>>>>> https://drive.google.com/file/d/1ERlnN8bA2fKCUPBHTnid1l__81qpQe2W/view
>>>>>>> [3]
>>>>>>> https://github.com/apache/beam/commit/2d5e493abf39ee6fc89831bb0b7ec9fee592b9c5
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Fri, Sep 6, 2019 at 8:38 AM Ahmet Altay <[email protected]> wrote:
>>>>>>>
>>>>>>>> +Valentyn Tymofieiev <[email protected]> do we have benchmarks
>>>>>>>> in different python versions? Was there a recent change that is 
>>>>>>>> specific to
>>>>>>>> python 3.x ?
>>>>>>>>
>>>>>>>> On Fri, Sep 6, 2019 at 8:36 AM Thomas Weise <[email protected]> wrote:
>>>>>>>>
>>>>>>>>> The issue is only visible with Python 3.6, not 2.7.
>>>>>>>>>
>>>>>>>>> If there is a framework in place to add a streaming test, that
>>>>>>>>> would be great. We would use what we have internally as starting 
>>>>>>>>> point.
>>>>>>>>>
>>>>>>>>> On Thu, Sep 5, 2019 at 5:00 PM Ahmet Altay <[email protected]>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Thu, Sep 5, 2019 at 4:15 PM Thomas Weise <[email protected]>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> The workload is quite different. What I have is streaming with
>>>>>>>>>>> state and timers.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Thu, Sep 5, 2019 at 3:47 PM Pablo Estrada <[email protected]>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> We only recently started running Chicago Taxi Example. +Michał
>>>>>>>>>>>> Walenia <[email protected]> I don't see it in the
>>>>>>>>>>>> dashboards. Do you know if it's possible to see any trends in the 
>>>>>>>>>>>> data?
>>>>>>>>>>>>
>>>>>>>>>>>> We have a few tests running now:
>>>>>>>>>>>> - Combine tests:
>>>>>>>>>>>> https://apache-beam-testing.appspot.com/explore?dashboard=5763764733345792&widget=201943890&container=1334074373
>>>>>>>>>>>> - GBK tests:
>>>>>>>>>>>> https://apache-beam-testing.appspot.com/explore?dashboard=5763764733345792&widget=201943890&container=1334074373
>>>>>>>>>>>>
>>>>>>>>>>>> They don't seem to show a very drastic jump either, but they
>>>>>>>>>>>> aren't very old.
>>>>>>>>>>>>
>>>>>>>>>>>> There is also work ongoing to add alerting for this sort of
>>>>>>>>>>>> regressions by Kasia and Kamil (added). The work is not there yet 
>>>>>>>>>>>> (it's in
>>>>>>>>>>>> progress).
>>>>>>>>>>>> Best
>>>>>>>>>>>> -P.
>>>>>>>>>>>>
>>>>>>>>>>>> On Thu, Sep 5, 2019 at 3:35 PM Thomas Weise <[email protected]>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> It probably won't be practical to do a bisect due to the high
>>>>>>>>>>>>> cost of each iteration with our fork/deploy setup.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Perhaps it is time to setup something with the synthetic
>>>>>>>>>>>>> source that works just with Beam as dependency.
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>> I agree with this.
>>>>>>>>>>
>>>>>>>>>> Pablo, Kasia, Kamil, does the new benchmarks give us a easy to
>>>>>>>>>> use framework for using synthetic source in benchmarks?
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>> On Thu, Sep 5, 2019 at 3:23 PM Ahmet Altay <[email protected]>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> There are a few in this dashboard [1], but not very useful in
>>>>>>>>>>>>>> this case because they do not go back more than a month and not 
>>>>>>>>>>>>>> very
>>>>>>>>>>>>>> comprehensive. I do not see a jump there. Thomas, would it be 
>>>>>>>>>>>>>> possible to
>>>>>>>>>>>>>> bisect to find what commit caused the regression?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> +Pablo Estrada <[email protected]> do we have any python on
>>>>>>>>>>>>>> flink benchmarks for chicago example?
>>>>>>>>>>>>>> +Alan Myrvold <[email protected]> +Yifan Zou
>>>>>>>>>>>>>> <[email protected]> It would be good to have alerts on
>>>>>>>>>>>>>> benchmarks. Do we have such an ability today?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> [1] https://apache-beam-testing.appspot.com/dashboard-admin
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Thu, Sep 5, 2019 at 3:15 PM Thomas Weise <[email protected]>
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Are there any performance tests run for the Python SDK as
>>>>>>>>>>>>>>> part of release verification (or otherwise as well)?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I see what appears to be a regression in master (compared to
>>>>>>>>>>>>>>> 2.14) with our in-house application (~ 25% jump in cpu 
>>>>>>>>>>>>>>> utilization and
>>>>>>>>>>>>>>> corresponds drop in throughput).
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I wanted to see if there is anything available to verify
>>>>>>>>>>>>>>> that within Beam.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>> Thomas
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>

Re: Possible Python SDK performance regression

Reply via email to