[
https://issues.apache.org/jira/browse/BEAM-9085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17029183#comment-17029183
]
Valentyn Tymofieiev commented on BEAM-9085:
-------------------------------------------
I did some profiling on direct runner using cprofile. The direct runner
slowdown might be caused by some inefficiency in threading model, or other
concurrency issue.
{noformat}
ncalls tottime percall cumtime percall filename:lineno(function)
...
16 5.278 0.330 5.278 0.330 {method 'acquire' of
'_thread.lock' objects}
2 0.000 0.000 5.279 2.640
fn_api_runner.py:2128(process_bundle)
2 0.000 0.000 5.281 2.640 fn_api_runner.py:803(_run_stage)
1 0.000 0.000 5.317 5.317 fn_api_runner.py:575(run_stages)
1 0.000 0.000 5.319 5.319
fn_api_runner.py:506(run_via_runner_api)
1 0.000 0.000 5.337 5.337 fn_api_runner.py:462(run_pipeline)
1 0.000 0.000 5.346 5.346 direct_runner.py:125(run_pipeline)
1 0.000 0.000 5.392 5.392 <string>:1(<module>)
1 0.000 0.000 5.392 5.392 test_pipeline.py:109(run)
2/1 0.000 0.000 5.392 5.392 pipeline.py:453(run)
42/1 0.000 0.000 5.392 5.392 {built-in method builtins.exec}
{noformat}
It is possible that Dataflow runner the slowdown manfiests via a different way,
since it does not use fn_api_runner.py.
> Investigate performance difference between Python 2/3 on Dataflow
> -----------------------------------------------------------------
>
> Key: BEAM-9085
> URL: https://issues.apache.org/jira/browse/BEAM-9085
> Project: Beam
> Issue Type: Bug
> Components: sdk-py-core
> Reporter: Kamil Wasilewski
> Assignee: Valentyn Tymofieiev
> Priority: Major
>
> Tests show that the performance of core Beam operations in Python 3.x on
> Dataflow can be a few time slower than in Python 2.7. We should investigate
> what's the cause of the problem.
> Currently, we have one ParDo test that is run both in Py3 and Py2 [1]. A
> dashboard with runtime results can be found here [2].
> [1] sdks/python/apache_beam/testing/load_tests/pardo_test.py
> [2] https://apache-beam-testing.appspot.com/explore?dashboard=5678187241537536
--
This message was sent by Atlassian Jira
(v8.3.4#803005)