To close the loop here, the regression reported here is not specific to
Beam or Dataflow. The difference in performance is caused by a 'regression'
in the deprecated numpy random number generator, which we use to generate
synthetic input for the load test pipeline.  Since new releases of numpy
don't support Python 2, our Py2 tests are using a different, older, numpy
version where  that generator happens to perform faster.

You can follow BEAM-9085 for further details.

On Fri, Jan 10, 2020 at 9:26 AM Valentyn Tymofieiev <valen...@google.com>
wrote:

> Thanks, Kamil. I self-assigned the issue, but if anyone else is
> interested, feel free to take a look in parallel and post your findings on
> the Jira.
>
> On Fri, Jan 10, 2020 at 4:29 AM Kamil Wasilewski <
> kamil.wasilew...@polidea.com> wrote:
>
>> Our first Python3 performance test has just been implemented and we have
>> just started gathering results. Here[1] you can find dashboards with a
>> side-by-side comparison.
>> I also opened a Jira ticket to investigate the difference [2]. Anyone,
>> please feel free to assign it to yourself.
>>
>> Thanks,
>> Kamil
>>
>> [1]
>> https://apache-beam-testing.appspot.com/explore?dashboard=5678187241537536
>> [2] https://issues.apache.org/jira/browse/BEAM-9085
>>
>> On Mon, Dec 9, 2019 at 8:38 PM Valentyn Tymofieiev <valen...@google.com>
>> wrote:
>>
>>> For now we should run Py3 and Py2 tests alongside each other to get a
>>> side-by-side comparison. I suggest we open a Jira ticket to investigate the
>>> difference in performance . We have limited performance test coverage on
>>> Python 3 in Beam, so more Py3 tests would help a lot here, thanks for
>>> adding them.
>>>
>>> On Fri, Dec 6, 2019 at 9:43 AM Robert Bradshaw <rober...@google.com>
>>> wrote:
>>>
>>>> This is very surprising--I would expect the times to quite similar. Do
>>>> you have profiles for where the (difference in) time is spent? With
>>>> differences like these, I wonder if there are issues with container
>>>> setup (e.g. some things not being installed or cached) for Python 3.
>>>>
>>>> On Fri, Dec 6, 2019 at 9:06 AM Kamil Wasilewski
>>>> <kamil.wasilew...@polidea.com> wrote:
>>>> >
>>>> > Hi all,
>>>> >
>>>> > Python 2.7 won't be maintained past 2020 and that's why we want to
>>>> migrate all Python performance tests in Beam from Python 2.7 to Python 3.7.
>>>> However, I was surprised by seeing that after switching Dataflow tests to
>>>> Python 3.x they are a few times slower. For example, the same ParDo test
>>>> that takes approx. 8 minutes to run on Python 2.7 needs approx. 21 minutes
>>>> on Python 3.x. You can find all the results I gathered and the setup here.
>>>> >
>>>> > Do you know any possible reason for this? This issue makes it
>>>> impossible to do the migration, because of the limited resources on Jenkins
>>>> (almost every job would be aborted).
>>>> >
>>>> > Thanks,
>>>> > Kamil
>>>>
>>>

Reply via email to