[
https://issues.apache.org/jira/browse/BEAM-9085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17035924#comment-17035924
]
Valentyn Tymofieiev commented on BEAM-9085:
-------------------------------------------
Per [1] The RandomState provides access to legacy generators. This generator is
considered frozen and will have no further improvements. It is guaranteed to
produce the same values as the final point release of NumPy v1.16. [...] . This
class should only be used if it is essential to have randoms that are identical
to what would have been produced by previous versions of NumPy.
[~kamilwu], would you have time to which replacement generator we can use for
consistent performance on 1.16.5 and 1.18.1? If that's not available, we can
downgrade numpy via supplying a requirements.txt file to a Dataflow job in
performance tests you are running, but there may be a better option.
[1] https://numpy.org/doc/1.18/reference/random/legacy.html
> Investigate performance difference between Python 2/3 on Dataflow
> -----------------------------------------------------------------
>
> Key: BEAM-9085
> URL: https://issues.apache.org/jira/browse/BEAM-9085
> Project: Beam
> Issue Type: Bug
> Components: sdk-py-core
> Reporter: Kamil Wasilewski
> Assignee: Valentyn Tymofieiev
> Priority: Major
>
> Tests show that the performance of core Beam operations in Python 3.x on
> Dataflow can be a few time slower than in Python 2.7. We should investigate
> what's the cause of the problem.
> Currently, we have one ParDo test that is run both in Py3 and Py2 [1]. A
> dashboard with runtime results can be found here [2].
> [1] sdks/python/apache_beam/testing/load_tests/pardo_test.py
> [2] https://apache-beam-testing.appspot.com/explore?dashboard=5678187241537536
--
This message was sent by Atlassian Jira
(v8.3.4#803005)