[ 
https://issues.apache.org/jira/browse/BEAM-9085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17035924#comment-17035924
 ] 

Valentyn Tymofieiev edited comment on BEAM-9085 at 2/13/20 5:10 AM:
--------------------------------------------------------------------

Per [1] The RandomState provides access to legacy generators. This generator is 
considered frozen and will have no further improvements. It is guaranteed to 
produce the same values as the final point release of NumPy v1.16. [...] . This 
class should only be used if it is essential to have randoms that are identical 
to what would have been produced by previous versions of NumPy.

[~kamilwu], would you have time to investigate which replacement generator we 
can use for consistent performance on 1.16.5 and 1.18.1? If that's not 
available, we can downgrade numpy via supplying a requirements.txt file to a 
Dataflow job in performance tests you are running, but there may be a better 
option.

[1]  https://numpy.org/doc/1.18/reference/random/legacy.html


was (Author: tvalentyn):
Per [1] The RandomState provides access to legacy generators. This generator is 
considered frozen and will have no further improvements. It is guaranteed to 
produce the same values as the final point release of NumPy v1.16. [...] . This 
class should only be used if it is essential to have randoms that are identical 
to what would have been produced by previous versions of NumPy.

[~kamilwu], would you have time to which replacement generator we can use for 
consistent performance on 1.16.5 and 1.18.1? If that's not available, we can 
downgrade numpy via supplying a requirements.txt file to a Dataflow job in 
performance tests you are running, but there may be a better option.

[1]  https://numpy.org/doc/1.18/reference/random/legacy.html

> Investigate performance difference between Python 2/3 on Dataflow
> -----------------------------------------------------------------
>
>                 Key: BEAM-9085
>                 URL: https://issues.apache.org/jira/browse/BEAM-9085
>             Project: Beam
>          Issue Type: Bug
>          Components: sdk-py-core
>            Reporter: Kamil Wasilewski
>            Assignee: Valentyn Tymofieiev
>            Priority: Major
>
> Tests show that the performance of core Beam operations in Python 3.x on 
> Dataflow can be a few time slower than in Python 2.7. We should investigate 
> what's the cause of the problem.
> Currently, we have one ParDo test that is run both in Py3 and Py2 [1]. A 
> dashboard with runtime results can be found here [2].
> [1] sdks/python/apache_beam/testing/load_tests/pardo_test.py
> [2] https://apache-beam-testing.appspot.com/explore?dashboard=5678187241537536



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to