I agree, let's investigate. Thomas could you file JIRAs once you have
additional information.

Valentyn, I think the performance regression could be investigated now, by
running whatever benchmarks that is available against 2.14, 2.15 and head
and see if the same regression could be reproduced.

On Fri, Sep 6, 2019 at 7:11 PM Valentyn Tymofieiev <valen...@google.com>
wrote:

> Sounds like these regressions need to be investigated ahead of 2.16.0
> release.
>
> On Fri, Sep 6, 2019 at 6:44 PM Thomas Weise <t...@apache.org> wrote:
>
>>
>>
>> On Fri, Sep 6, 2019 at 6:23 PM Ahmet Altay <al...@google.com> wrote:
>>
>>>
>>>
>>> On Fri, Sep 6, 2019 at 6:17 PM Thomas Weise <t...@apache.org> wrote:
>>>
>>>>
>>>>
>>>> On Fri, Sep 6, 2019 at 2:24 PM Valentyn Tymofieiev <valen...@google.com>
>>>> wrote:
>>>>
>>>>> +Mark Liu <mark...@google.com> has added some benchmarks running
>>>>> across multiple Python versions. Specifically we run 1 GB wordcount job on
>>>>> Dataflow runner on Python 2.7, 3.5-3.7. The benchmarks do not have
>>>>> configured alerting and to my knowledge are not actively monitored yet.
>>>>>
>>>>
>>>> Are there any benchmarks for streaming? Streaming and batch are quite
>>>> different runtime paths. And some of the issues can only be identified
>>>> with longer running processes through metrics. It would be good to verify
>>>> utilization of memory, cpu etc.
>>>>
>>>> I additionally discovered that our 2.16 upgrade exhibits a memory leak
>>>> in the Python worker (Py 2.7).
>>>>
>>>
>>> Do you have more details on this one?
>>>
>>
>> Unfortunately only that at the moment. The workers eat up all memory and
>> eventually crash. Reverted back to 2.14 / Py 3.6 and the issue is gone.
>>
>>
>>>
>>>
>>>>
>>>>
>>>>> Thomas, is it possible for you to do the bisection using SDK code from
>>>>> master at various commits to narrow down the regression on your end?
>>>>>
>>>>
>>>> I don't know how soon I will get to it. It's of course possible, but
>>>> expensive due to having to rebase the fork, build and deploy an entire
>>>> stack of stuff for each iteration. The pipeline itself is super simple. We
>>>> need this testbed as part of Beam. It would be nice to be able to pick an
>>>> update and have more confidence that the baseline has not slipped.
>>>>
>>>>
>>>>>
>>>>> [1]
>>>>> https://apache-beam-testing.appspot.com/explore?dashboard=5691127080419328
>>>>> [2]
>>>>> https://drive.google.com/file/d/1ERlnN8bA2fKCUPBHTnid1l__81qpQe2W/view
>>>>> [3]
>>>>> https://github.com/apache/beam/commit/2d5e493abf39ee6fc89831bb0b7ec9fee592b9c5
>>>>>
>>>>>
>>>>>
>>>>> On Fri, Sep 6, 2019 at 8:38 AM Ahmet Altay <al...@google.com> wrote:
>>>>>
>>>>>> +Valentyn Tymofieiev <valen...@google.com> do we have benchmarks in
>>>>>> different python versions? Was there a recent change that is specific to
>>>>>> python 3.x ?
>>>>>>
>>>>>> On Fri, Sep 6, 2019 at 8:36 AM Thomas Weise <t...@apache.org> wrote:
>>>>>>
>>>>>>> The issue is only visible with Python 3.6, not 2.7.
>>>>>>>
>>>>>>> If there is a framework in place to add a streaming test, that would
>>>>>>> be great. We would use what we have internally as starting point.
>>>>>>>
>>>>>>> On Thu, Sep 5, 2019 at 5:00 PM Ahmet Altay <al...@google.com> wrote:
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Thu, Sep 5, 2019 at 4:15 PM Thomas Weise <t...@apache.org> wrote:
>>>>>>>>
>>>>>>>>> The workload is quite different. What I have is streaming with
>>>>>>>>> state and timers.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Thu, Sep 5, 2019 at 3:47 PM Pablo Estrada <pabl...@google.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> We only recently started running Chicago Taxi Example. +MichaƂ
>>>>>>>>>> Walenia <michal.wale...@polidea.com> I don't see it in the
>>>>>>>>>> dashboards. Do you know if it's possible to see any trends in the 
>>>>>>>>>> data?
>>>>>>>>>>
>>>>>>>>>> We have a few tests running now:
>>>>>>>>>> - Combine tests:
>>>>>>>>>> https://apache-beam-testing.appspot.com/explore?dashboard=5763764733345792&widget=201943890&container=1334074373
>>>>>>>>>> - GBK tests:
>>>>>>>>>> https://apache-beam-testing.appspot.com/explore?dashboard=5763764733345792&widget=201943890&container=1334074373
>>>>>>>>>>
>>>>>>>>>> They don't seem to show a very drastic jump either, but they
>>>>>>>>>> aren't very old.
>>>>>>>>>>
>>>>>>>>>> There is also work ongoing to add alerting for this sort of
>>>>>>>>>> regressions by Kasia and Kamil (added). The work is not there yet 
>>>>>>>>>> (it's in
>>>>>>>>>> progress).
>>>>>>>>>> Best
>>>>>>>>>> -P.
>>>>>>>>>>
>>>>>>>>>> On Thu, Sep 5, 2019 at 3:35 PM Thomas Weise <t...@apache.org>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> It probably won't be practical to do a bisect due to the high
>>>>>>>>>>> cost of each iteration with our fork/deploy setup.
>>>>>>>>>>>
>>>>>>>>>>> Perhaps it is time to setup something with the synthetic source
>>>>>>>>>>> that works just with Beam as dependency.
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>> I agree with this.
>>>>>>>>
>>>>>>>> Pablo, Kasia, Kamil, does the new benchmarks give us a easy to use
>>>>>>>> framework for using synthetic source in benchmarks?
>>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>>>> On Thu, Sep 5, 2019 at 3:23 PM Ahmet Altay <al...@google.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> There are a few in this dashboard [1], but not very useful in
>>>>>>>>>>>> this case because they do not go back more than a month and not 
>>>>>>>>>>>> very
>>>>>>>>>>>> comprehensive. I do not see a jump there. Thomas, would it be 
>>>>>>>>>>>> possible to
>>>>>>>>>>>> bisect to find what commit caused the regression?
>>>>>>>>>>>>
>>>>>>>>>>>> +Pablo Estrada <pabl...@google.com> do we have any python on
>>>>>>>>>>>> flink benchmarks for chicago example?
>>>>>>>>>>>> +Alan Myrvold <amyrv...@google.com> +Yifan Zou
>>>>>>>>>>>> <yifan...@google.com> It would be good to have alerts on
>>>>>>>>>>>> benchmarks. Do we have such an ability today?
>>>>>>>>>>>>
>>>>>>>>>>>> [1] https://apache-beam-testing.appspot.com/dashboard-admin
>>>>>>>>>>>>
>>>>>>>>>>>> On Thu, Sep 5, 2019 at 3:15 PM Thomas Weise <t...@apache.org>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Are there any performance tests run for the Python SDK as part
>>>>>>>>>>>>> of release verification (or otherwise as well)?
>>>>>>>>>>>>>
>>>>>>>>>>>>> I see what appears to be a regression in master (compared to
>>>>>>>>>>>>> 2.14) with our in-house application (~ 25% jump in cpu 
>>>>>>>>>>>>> utilization and
>>>>>>>>>>>>> corresponds drop in throughput).
>>>>>>>>>>>>>
>>>>>>>>>>>>> I wanted to see if there is anything available to verify that
>>>>>>>>>>>>> within Beam.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>> Thomas
>>>>>>>>>>>>>
>>>>>>>>>>>>>

Reply via email to