Re: Possible Python SDK performance regression

Lukasz Cwik Wed, 25 Sep 2019 10:02:17 -0700

My environment has gotten all the dependencies installed/setup/maintained
organically over time as the project has evolved.



On Wed, Sep 25, 2019 at 9:56 AM Thomas Weise <t...@apache.org> wrote:

> The issue was related to how we build our custom packages.
>
> However, what might help users is documentation about the Cython setup,
> which is currently missing from the Python SDK docs.
>
> I'm also wondering how folks setup their environment for releases. Is it
> manual? Or is there a container that has all dependencies preinstalled?
>
> Thanks,
> Thomas
>
>
> On Wed, Sep 25, 2019 at 7:14 AM Valentyn Tymofieiev <valen...@google.com>
> wrote:
>
>> Thank you. In case there are details that would be relevant for others in
>> the community to avoid similar regressions, feel free to share them. We
>> also have Cython experts here who may be able to  advise.
>>
>>
>> On Wed, Sep 25, 2019 at 6:58 AM Thomas Weise <t...@apache.org> wrote:
>>
>>> After running through the entire bisect based on the 2.16 release branch
>>> I found that the regression was caused by our own Cython setup. So green
>>> light for the 2.16.0 release.
>>>
>>> Thomas
>>>
>>> On Tue, Sep 17, 2019 at 1:21 PM Thomas Weise <t...@apache.org> wrote:
>>>
>>>> Hi Valentyn,
>>>>
>>>> Thanks for the reminder. The bisect is on my TODO list.
>>>>
>>>> Hopefully this week.
>>>>
>>>> I saw the discussion about declaring 2.16 LTS. We probably need to sort
>>>> these performance concerns out prior to doing so.
>>>>
>>>> Thomas
>>>>
>>>>
>>>> On Tue, Sep 17, 2019 at 12:02 PM Valentyn Tymofieiev <
>>>> valen...@google.com> wrote:
>>>>
>>>>> Hi Thomas,
>>>>>
>>>>> Just a reminder that 2.16.0 was cut and soon the voting may start, so
>>>>> to avoid the regression that you reported blocking the vote, it would be
>>>>> great to start investigate it if it is reproducible.
>>>>>
>>>>> Thanks,
>>>>> Valentyn
>>>>>
>>>>> On Tue, Sep 10, 2019 at 1:53 PM Valentyn Tymofieiev <
>>>>> valen...@google.com> wrote:
>>>>>
>>>>>> Thomas, did you have a change to open a Jira for the streaming
>>>>>> regression you observe? If not, could you please do so and cc +Ankur
>>>>>> Goenka <goe...@google.com> ? I talked with Ankur offline and he is
>>>>>> also interested in this regression.
>>>>>>
>>>>>> I opened:
>>>>>> - https://issues.apache.org/jira/browse/BEAM-8198 for batch
>>>>>> regression.
>>>>>> - https://issues.apache.org/jira/browse/BEAM-8199 to improve tooling
>>>>>> around performance monitoring.
>>>>>> - https://issues.apache.org/jira/browse/BEAM-8200 to add benchmarks
>>>>>> for streaming.
>>>>>>
>>>>>> I cc'ed some folks, however not everyone. Manisha, I could not find
>>>>>> your username in Jira, feel free to cc or assign BEAM-8199
>>>>>> <https://issues.apache.org/jira/browse/BEAM-8199>  to yourself if
>>>>>> that is something you are actively working on.
>>>>>>
>>>>>> Thanks,
>>>>>> Valentyn
>>>>>>
>>>>>> On Mon, Sep 9, 2019 at 9:59 AM Mark Liu <mark...@google.com> wrote:
>>>>>>
>>>>>>> +Alan Myrvold <amyrv...@google.com> +Yifan Zou <yifan...@google.com> It
>>>>>>>> would be good to have alerts on benchmarks. Do we have such an ability
>>>>>>>> today?
>>>>>>>>
>>>>>>>
>>>>>>> As for regression detection, we have a Jenkins job
>>>>>>> beam_PerformanceTests_Analysis
>>>>>>> <https://builds.apache.org/view/A-D/view/Beam/view/All/job/beam_PerformanceTests_Analysis/>
>>>>>>>  which
>>>>>>> analyzes metrics on Bigquery and report a summary to job console output.
>>>>>>> However, not all jobs are registered on this analyzer and currently no
>>>>>>> further alerts integrated with it (e.g. email / slack).
>>>>>>>
>>>>>>> There are ongoing work to add alerting to benchmarks. Kasia and
>>>>>>> Kamil are investigating on Prometheus + Grafana, and Manisha and me are
>>>>>>> looking into mako.dev.
>>>>>>>
>>>>>>> Mark
>>>>>>>
>>>>>>> On Fri, Sep 6, 2019 at 7:21 PM Ahmet Altay <al...@google.com> wrote:
>>>>>>>
>>>>>>>> I agree, let's investigate. Thomas could you file JIRAs once you
>>>>>>>> have additional information.
>>>>>>>>
>>>>>>>> Valentyn, I think the performance regression could be investigated
>>>>>>>> now, by running whatever benchmarks that is available against 2.14, 
>>>>>>>> 2.15
>>>>>>>> and head and see if the same regression could be reproduced.
>>>>>>>>
>>>>>>>> On Fri, Sep 6, 2019 at 7:11 PM Valentyn Tymofieiev <
>>>>>>>> valen...@google.com> wrote:
>>>>>>>>
>>>>>>>>> Sounds like these regressions need to be investigated ahead of
>>>>>>>>> 2.16.0 release.
>>>>>>>>>
>>>>>>>>> On Fri, Sep 6, 2019 at 6:44 PM Thomas Weise <t...@apache.org>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Fri, Sep 6, 2019 at 6:23 PM Ahmet Altay <al...@google.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Fri, Sep 6, 2019 at 6:17 PM Thomas Weise <t...@apache.org>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Fri, Sep 6, 2019 at 2:24 PM Valentyn Tymofieiev <valentyn@
>>>>>>>>>>>> google.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> +Mark Liu <mark...@google.com> has added some benchmarks
>>>>>>>>>>>>> running across multiple Python versions. Specifically we run 1 GB 
>>>>>>>>>>>>> wordcount
>>>>>>>>>>>>> job on Dataflow runner on Python 2.7, 3.5-3.7. The benchmarks do 
>>>>>>>>>>>>> not have
>>>>>>>>>>>>> configured alerting and to my knowledge are not actively 
>>>>>>>>>>>>> monitored yet.
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Are there any benchmarks for streaming? Streaming and batch are
>>>>>>>>>>>> quite different runtime paths. And some of the issues can only
>>>>>>>>>>>> be identified with longer running processes through metrics. It 
>>>>>>>>>>>> would be
>>>>>>>>>>>> good to verify utilization of memory, cpu etc.
>>>>>>>>>>>>
>>>>>>>>>>>> I additionally discovered that our 2.16 upgrade exhibits a
>>>>>>>>>>>> memory leak in the Python worker (Py 2.7).
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Do you have more details on this one?
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Unfortunately only that at the moment. The workers eat up all
>>>>>>>>>> memory and eventually crash. Reverted back to 2.14 / Py 3.6 and the 
>>>>>>>>>> issue
>>>>>>>>>> is gone.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>> Thomas, is it possible for you to do the bisection using SDK
>>>>>>>>>>>>> code from master at various commits to narrow down the regression 
>>>>>>>>>>>>> on your
>>>>>>>>>>>>> end?
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> I don't know how soon I will get to it. It's of course
>>>>>>>>>>>> possible, but expensive due to having to rebase the fork,
>>>>>>>>>>>> build and deploy an entire stack of stuff for each iteration. The 
>>>>>>>>>>>> pipeline
>>>>>>>>>>>> itself is super simple. We need this testbed as part of Beam. It 
>>>>>>>>>>>> would be
>>>>>>>>>>>> nice to be able to pick an update and have more confidence that the
>>>>>>>>>>>> baseline has not slipped.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> [1]
>>>>>>>>>>>>> https://apache-beam-testing.appspot.com/explore?dashboard=5691127080419328
>>>>>>>>>>>>> [2]
>>>>>>>>>>>>> https://drive.google.com/file/d/1ERlnN8bA2fKCUPBHTnid1l__81qpQe2W/view
>>>>>>>>>>>>> [3]
>>>>>>>>>>>>> https://github.com/apache/beam/commit/2d5e493abf39ee6fc89831bb0b7ec9fee592b9c5
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Fri, Sep 6, 2019 at 8:38 AM Ahmet Altay <al...@google.com>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> +Valentyn Tymofieiev <valen...@google.com> do we have
>>>>>>>>>>>>>> benchmarks in different python versions? Was there a recent 
>>>>>>>>>>>>>> change that is
>>>>>>>>>>>>>> specific to python 3.x ?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Fri, Sep 6, 2019 at 8:36 AM Thomas Weise <t...@apache.org>
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> The issue is only visible with Python 3.6, not 2.7.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> If there is a framework in place to add a streaming test,
>>>>>>>>>>>>>>> that would be great. We would use what we have internally as 
>>>>>>>>>>>>>>> starting point.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Thu, Sep 5, 2019 at 5:00 PM Ahmet Altay <al...@google.com>
>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Thu, Sep 5, 2019 at 4:15 PM Thomas Weise <t...@apache.org>
>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> The workload is quite different. What I have is streaming
>>>>>>>>>>>>>>>>> with state and timers.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Thu, Sep 5, 2019 at 3:47 PM Pablo Estrada <
>>>>>>>>>>>>>>>>> pabl...@google.com> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> We only recently started running Chicago Taxi Example. 
>>>>>>>>>>>>>>>>>> +Michał
>>>>>>>>>>>>>>>>>> Walenia <michal.wale...@polidea.com> I don't see it in
>>>>>>>>>>>>>>>>>> the dashboards. Do you know if it's possible to see any 
>>>>>>>>>>>>>>>>>> trends in the data?
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> We have a few tests running now:
>>>>>>>>>>>>>>>>>> - Combine tests:
>>>>>>>>>>>>>>>>>> https://apache-beam-testing.appspot.com/explore?dashboard=5763764733345792&widget=201943890&container=1334074373
>>>>>>>>>>>>>>>>>> - GBK tests:
>>>>>>>>>>>>>>>>>> https://apache-beam-testing.appspot.com/explore?dashboard=5763764733345792&widget=201943890&container=1334074373
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> They don't seem to show a very drastic jump either, but
>>>>>>>>>>>>>>>>>> they aren't very old.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> There is also work ongoing to add alerting for this sort
>>>>>>>>>>>>>>>>>> of regressions by Kasia and Kamil (added). The work is not 
>>>>>>>>>>>>>>>>>> there yet (it's
>>>>>>>>>>>>>>>>>> in progress).
>>>>>>>>>>>>>>>>>> Best
>>>>>>>>>>>>>>>>>> -P.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Thu, Sep 5, 2019 at 3:35 PM Thomas Weise <
>>>>>>>>>>>>>>>>>> t...@apache.org> wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> It probably won't be practical to do a bisect due to the
>>>>>>>>>>>>>>>>>>> high cost of each iteration with our fork/deploy setup.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Perhaps it is time to setup something with the synthetic
>>>>>>>>>>>>>>>>>>> source that works just with Beam as dependency.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I agree with this.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Pablo, Kasia, Kamil, does the new benchmarks give us a easy
>>>>>>>>>>>>>>>> to use framework for using synthetic source in benchmarks?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On Thu, Sep 5, 2019 at 3:23 PM Ahmet Altay <
>>>>>>>>>>>>>>>>>>> al...@google.com> wrote:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> There are a few in this dashboard [1], but not very
>>>>>>>>>>>>>>>>>>>> useful in this case because they do not go back more than 
>>>>>>>>>>>>>>>>>>>> a month and not
>>>>>>>>>>>>>>>>>>>> very comprehensive. I do not see a jump there. Thomas, 
>>>>>>>>>>>>>>>>>>>> would it be possible
>>>>>>>>>>>>>>>>>>>> to bisect to find what commit caused the regression?
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> +Pablo Estrada <pabl...@google.com> do we have any
>>>>>>>>>>>>>>>>>>>> python on flink benchmarks for chicago example?
>>>>>>>>>>>>>>>>>>>> +Alan Myrvold <amyrv...@google.com> +Yifan Zou
>>>>>>>>>>>>>>>>>>>> <yifan...@google.com> It would be good to have alerts
>>>>>>>>>>>>>>>>>>>> on benchmarks. Do we have such an ability today?
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> [1]
>>>>>>>>>>>>>>>>>>>> https://apache-beam-testing.appspot.com/dashboard-admin
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> On Thu, Sep 5, 2019 at 3:15 PM Thomas Weise <
>>>>>>>>>>>>>>>>>>>> t...@apache.org> wrote:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Are there any performance tests run for the Python SDK
>>>>>>>>>>>>>>>>>>>>> as part of release verification (or otherwise as well)?
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> I see what appears to be a regression in master
>>>>>>>>>>>>>>>>>>>>> (compared to 2.14) with our in-house application (~ 25% 
>>>>>>>>>>>>>>>>>>>>> jump in cpu
>>>>>>>>>>>>>>>>>>>>> utilization and corresponds drop in throughput).
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> I wanted to see if there is anything available to
>>>>>>>>>>>>>>>>>>>>> verify that within Beam.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>> Thomas
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>

Re: Possible Python SDK performance regression

Reply via email to