Re: Possible Python SDK performance regression

Thomas Weise Tue, 17 Sep 2019 13:21:47 -0700

Hi Valentyn,

Thanks for the reminder. The bisect is on my TODO list.


Hopefully this week.

I saw the discussion about declaring 2.16 LTS. We probably need to sort
these performance concerns out prior to doing so.

Thomas


On Tue, Sep 17, 2019 at 12:02 PM Valentyn Tymofieiev <valen...@google.com>
wrote:

> Hi Thomas,
>
> Just a reminder that 2.16.0 was cut and soon the voting may start, so to
> avoid the regression that you reported blocking the vote, it would be great
> to start investigate it if it is reproducible.
>
> Thanks,
> Valentyn
>
> On Tue, Sep 10, 2019 at 1:53 PM Valentyn Tymofieiev <valen...@google.com>
> wrote:
>
>> Thomas, did you have a change to open a Jira for the streaming regression
>> you observe? If not, could you please do so and cc +Ankur Goenka
>> <goe...@google.com> ? I talked with Ankur offline and he is also
>> interested in this regression.
>>
>> I opened:
>> - https://issues.apache.org/jira/browse/BEAM-8198 for batch regression.
>> - https://issues.apache.org/jira/browse/BEAM-8199 to improve tooling
>> around performance monitoring.
>> - https://issues.apache.org/jira/browse/BEAM-8200 to add benchmarks for
>> streaming.
>>
>> I cc'ed some folks, however not everyone. Manisha, I could not find your
>> username in Jira, feel free to cc or assign BEAM-8199
>> <https://issues.apache.org/jira/browse/BEAM-8199>  to yourself if that
>> is something you are actively working on.
>>
>> Thanks,
>> Valentyn
>>
>> On Mon, Sep 9, 2019 at 9:59 AM Mark Liu <mark...@google.com> wrote:
>>
>>> +Alan Myrvold <amyrv...@google.com> +Yifan Zou <yifan...@google.com> It
>>>> would be good to have alerts on benchmarks. Do we have such an ability
>>>> today?
>>>>
>>>
>>> As for regression detection, we have a Jenkins job
>>> beam_PerformanceTests_Analysis
>>> <https://builds.apache.org/view/A-D/view/Beam/view/All/job/beam_PerformanceTests_Analysis/>
>>>  which
>>> analyzes metrics on Bigquery and report a summary to job console output.
>>> However, not all jobs are registered on this analyzer and currently no
>>> further alerts integrated with it (e.g. email / slack).
>>>
>>> There are ongoing work to add alerting to benchmarks. Kasia and Kamil
>>> are investigating on Prometheus + Grafana, and Manisha and me are looking
>>> into mako.dev.
>>>
>>> Mark
>>>
>>> On Fri, Sep 6, 2019 at 7:21 PM Ahmet Altay <al...@google.com> wrote:
>>>
>>>> I agree, let's investigate. Thomas could you file JIRAs once you have
>>>> additional information.
>>>>
>>>> Valentyn, I think the performance regression could be investigated now,
>>>> by running whatever benchmarks that is available against 2.14, 2.15 and
>>>> head and see if the same regression could be reproduced.
>>>>
>>>> On Fri, Sep 6, 2019 at 7:11 PM Valentyn Tymofieiev <valen...@google.com>
>>>> wrote:
>>>>
>>>>> Sounds like these regressions need to be investigated ahead of 2.16.0
>>>>> release.
>>>>>
>>>>> On Fri, Sep 6, 2019 at 6:44 PM Thomas Weise <t...@apache.org> wrote:
>>>>>
>>>>>>
>>>>>>
>>>>>> On Fri, Sep 6, 2019 at 6:23 PM Ahmet Altay <al...@google.com> wrote:
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Fri, Sep 6, 2019 at 6:17 PM Thomas Weise <t...@apache.org> wrote:
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Fri, Sep 6, 2019 at 2:24 PM Valentyn Tymofieiev <valentyn@
>>>>>>>> google.com> wrote:
>>>>>>>>
>>>>>>>>> +Mark Liu <mark...@google.com> has added some benchmarks running
>>>>>>>>> across multiple Python versions. Specifically we run 1 GB wordcount 
>>>>>>>>> job on
>>>>>>>>> Dataflow runner on Python 2.7, 3.5-3.7. The benchmarks do not have
>>>>>>>>> configured alerting and to my knowledge are not actively monitored 
>>>>>>>>> yet.
>>>>>>>>>
>>>>>>>>
>>>>>>>> Are there any benchmarks for streaming? Streaming and batch are
>>>>>>>> quite different runtime paths. And some of the issues can only be
>>>>>>>> identified with longer running processes through metrics. It would be 
>>>>>>>> good
>>>>>>>> to verify utilization of memory, cpu etc.
>>>>>>>>
>>>>>>>> I additionally discovered that our 2.16 upgrade exhibits a memory
>>>>>>>> leak in the Python worker (Py 2.7).
>>>>>>>>
>>>>>>>
>>>>>>> Do you have more details on this one?
>>>>>>>
>>>>>>
>>>>>> Unfortunately only that at the moment. The workers eat up all memory
>>>>>> and eventually crash. Reverted back to 2.14 / Py 3.6 and the issue is 
>>>>>> gone.
>>>>>>
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>> Thomas, is it possible for you to do the bisection using SDK code
>>>>>>>>> from master at various commits to narrow down the regression on your 
>>>>>>>>> end?
>>>>>>>>>
>>>>>>>>
>>>>>>>> I don't know how soon I will get to it. It's of course possible,
>>>>>>>> but expensive due to having to rebase the fork, build and deploy
>>>>>>>> an entire stack of stuff for each iteration. The pipeline itself is 
>>>>>>>> super
>>>>>>>> simple. We need this testbed as part of Beam. It would be nice to be 
>>>>>>>> able
>>>>>>>> to pick an update and have more confidence that the baseline has not
>>>>>>>> slipped.
>>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>> [1]
>>>>>>>>> https://apache-beam-testing.appspot.com/explore?dashboard=5691127080419328
>>>>>>>>> [2]
>>>>>>>>> https://drive.google.com/file/d/1ERlnN8bA2fKCUPBHTnid1l__81qpQe2W/view
>>>>>>>>> [3]
>>>>>>>>> https://github.com/apache/beam/commit/2d5e493abf39ee6fc89831bb0b7ec9fee592b9c5
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Fri, Sep 6, 2019 at 8:38 AM Ahmet Altay <al...@google.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> +Valentyn Tymofieiev <valen...@google.com> do we have benchmarks
>>>>>>>>>> in different python versions? Was there a recent change that is 
>>>>>>>>>> specific to
>>>>>>>>>> python 3.x ?
>>>>>>>>>>
>>>>>>>>>> On Fri, Sep 6, 2019 at 8:36 AM Thomas Weise <t...@apache.org>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> The issue is only visible with Python 3.6, not 2.7.
>>>>>>>>>>>
>>>>>>>>>>> If there is a framework in place to add a streaming test, that
>>>>>>>>>>> would be great. We would use what we have internally as starting 
>>>>>>>>>>> point.
>>>>>>>>>>>
>>>>>>>>>>> On Thu, Sep 5, 2019 at 5:00 PM Ahmet Altay <al...@google.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Thu, Sep 5, 2019 at 4:15 PM Thomas Weise <t...@apache.org>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> The workload is quite different. What I have is streaming with
>>>>>>>>>>>>> state and timers.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Thu, Sep 5, 2019 at 3:47 PM Pablo Estrada <
>>>>>>>>>>>>> pabl...@google.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> We only recently started running Chicago Taxi Example. +Michał
>>>>>>>>>>>>>> Walenia <michal.wale...@polidea.com> I don't see it in the
>>>>>>>>>>>>>> dashboards. Do you know if it's possible to see any trends in 
>>>>>>>>>>>>>> the data?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> We have a few tests running now:
>>>>>>>>>>>>>> - Combine tests:
>>>>>>>>>>>>>> https://apache-beam-testing.appspot.com/explore?dashboard=5763764733345792&widget=201943890&container=1334074373
>>>>>>>>>>>>>> - GBK tests:
>>>>>>>>>>>>>> https://apache-beam-testing.appspot.com/explore?dashboard=5763764733345792&widget=201943890&container=1334074373
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> They don't seem to show a very drastic jump either, but they
>>>>>>>>>>>>>> aren't very old.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> There is also work ongoing to add alerting for this sort of
>>>>>>>>>>>>>> regressions by Kasia and Kamil (added). The work is not there 
>>>>>>>>>>>>>> yet (it's in
>>>>>>>>>>>>>> progress).
>>>>>>>>>>>>>> Best
>>>>>>>>>>>>>> -P.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Thu, Sep 5, 2019 at 3:35 PM Thomas Weise <t...@apache.org>
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> It probably won't be practical to do a bisect due to the
>>>>>>>>>>>>>>> high cost of each iteration with our fork/deploy setup.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Perhaps it is time to setup something with the synthetic
>>>>>>>>>>>>>>> source that works just with Beam as dependency.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>> I agree with this.
>>>>>>>>>>>>
>>>>>>>>>>>> Pablo, Kasia, Kamil, does the new benchmarks give us a easy to
>>>>>>>>>>>> use framework for using synthetic source in benchmarks?
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Thu, Sep 5, 2019 at 3:23 PM Ahmet Altay <al...@google.com>
>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> There are a few in this dashboard [1], but not very useful
>>>>>>>>>>>>>>>> in this case because they do not go back more than a month and 
>>>>>>>>>>>>>>>> not very
>>>>>>>>>>>>>>>> comprehensive. I do not see a jump there. Thomas, would it be 
>>>>>>>>>>>>>>>> possible to
>>>>>>>>>>>>>>>> bisect to find what commit caused the regression?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> +Pablo Estrada <pabl...@google.com> do we have any python
>>>>>>>>>>>>>>>> on flink benchmarks for chicago example?
>>>>>>>>>>>>>>>> +Alan Myrvold <amyrv...@google.com> +Yifan Zou
>>>>>>>>>>>>>>>> <yifan...@google.com> It would be good to have alerts on
>>>>>>>>>>>>>>>> benchmarks. Do we have such an ability today?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> [1] https://apache-beam-testing.appspot.com/dashboard-admin
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Thu, Sep 5, 2019 at 3:15 PM Thomas Weise <t...@apache.org>
>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Are there any performance tests run for the Python SDK as
>>>>>>>>>>>>>>>>> part of release verification (or otherwise as well)?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I see what appears to be a regression in master (compared
>>>>>>>>>>>>>>>>> to 2.14) with our in-house application (~ 25% jump in cpu 
>>>>>>>>>>>>>>>>> utilization and
>>>>>>>>>>>>>>>>> corresponds drop in throughput).
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I wanted to see if there is anything available to verify
>>>>>>>>>>>>>>>>> that within Beam.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>> Thomas
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>

Re: Possible Python SDK performance regression

Reply via email to