[ https://issues.apache.org/jira/browse/BEAM-9085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17035924#comment-17035924 ]
Valentyn Tymofieiev edited comment on BEAM-9085 at 2/13/20 5:10 AM: -------------------------------------------------------------------- Per [1] The RandomState provides access to legacy generators. This generator is considered frozen and will have no further improvements. It is guaranteed to produce the same values as the final point release of NumPy v1.16. [...] . This class should only be used if it is essential to have randoms that are identical to what would have been produced by previous versions of NumPy. [~kamilwu], would you have time to investigate which replacement generator we can use for consistent performance on 1.16.5 and 1.18.1? If that's not available, we can downgrade numpy via supplying a requirements.txt file to a Dataflow job in performance tests you are running, but there may be a better option. [1] https://numpy.org/doc/1.18/reference/random/legacy.html was (Author: tvalentyn): Per [1] The RandomState provides access to legacy generators. This generator is considered frozen and will have no further improvements. It is guaranteed to produce the same values as the final point release of NumPy v1.16. [...] . This class should only be used if it is essential to have randoms that are identical to what would have been produced by previous versions of NumPy. [~kamilwu], would you have time to which replacement generator we can use for consistent performance on 1.16.5 and 1.18.1? If that's not available, we can downgrade numpy via supplying a requirements.txt file to a Dataflow job in performance tests you are running, but there may be a better option. [1] https://numpy.org/doc/1.18/reference/random/legacy.html > Investigate performance difference between Python 2/3 on Dataflow > ----------------------------------------------------------------- > > Key: BEAM-9085 > URL: https://issues.apache.org/jira/browse/BEAM-9085 > Project: Beam > Issue Type: Bug > Components: sdk-py-core > Reporter: Kamil Wasilewski > Assignee: Valentyn Tymofieiev > Priority: Major > > Tests show that the performance of core Beam operations in Python 3.x on > Dataflow can be a few time slower than in Python 2.7. We should investigate > what's the cause of the problem. > Currently, we have one ParDo test that is run both in Py3 and Py2 [1]. A > dashboard with runtime results can be found here [2]. > [1] sdks/python/apache_beam/testing/load_tests/pardo_test.py > [2] https://apache-beam-testing.appspot.com/explore?dashboard=5678187241537536 -- This message was sent by Atlassian Jira (v8.3.4#803005)