[jira] [Commented] (BEAM-9085) Investigate performance difference between Python 2/3 on Dataflow

Valentyn Tymofieiev (Jira) Mon, 03 Feb 2020 10:50:35 -0800


    [ 
https://issues.apache.org/jira/browse/BEAM-9085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17029183#comment-17029183
 ]


Valentyn Tymofieiev commented on BEAM-9085:
-------------------------------------------

I did some profiling on direct runner using cprofile. The direct runner 
slowdown might be caused by some inefficiency in threading model, or other 
concurrency issue.

{noformat}

   ncalls  tottime  percall  cumtime  percall filename:lineno(function) 
...
       16    5.278    0.330    5.278    0.330 {method 'acquire' of 
'_thread.lock' objects}
        2    0.000    0.000    5.279    2.640 
fn_api_runner.py:2128(process_bundle)
        2    0.000    0.000    5.281    2.640 fn_api_runner.py:803(_run_stage)
        1    0.000    0.000    5.317    5.317 fn_api_runner.py:575(run_stages)
        1    0.000    0.000    5.319    5.319 
fn_api_runner.py:506(run_via_runner_api)
        1    0.000    0.000    5.337    5.337 fn_api_runner.py:462(run_pipeline)
        1    0.000    0.000    5.346    5.346 direct_runner.py:125(run_pipeline)
        1    0.000    0.000    5.392    5.392 <string>:1(<module>)
        1    0.000    0.000    5.392    5.392 test_pipeline.py:109(run)
      2/1    0.000    0.000    5.392    5.392 pipeline.py:453(run)
     42/1    0.000    0.000    5.392    5.392 {built-in method builtins.exec}

{noformat}

It is possible that Dataflow runner the slowdown manfiests via a different way, 
since it does not use fn_api_runner.py.

> Investigate performance difference between Python 2/3 on Dataflow
> -----------------------------------------------------------------
>
>                 Key: BEAM-9085
>                 URL: https://issues.apache.org/jira/browse/BEAM-9085
>             Project: Beam
>          Issue Type: Bug
>          Components: sdk-py-core
>            Reporter: Kamil Wasilewski
>            Assignee: Valentyn Tymofieiev
>            Priority: Major
>
> Tests show that the performance of core Beam operations in Python 3.x on 
> Dataflow can be a few time slower than in Python 2.7. We should investigate 
> what's the cause of the problem.
> Currently, we have one ParDo test that is run both in Py3 and Py2 [1]. A 
> dashboard with runtime results can be found here [2].
> [1] sdks/python/apache_beam/testing/load_tests/pardo_test.py
> [2] https://apache-beam-testing.appspot.com/explore?dashboard=5678187241537536



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (BEAM-9085) Investigate performance difference between Python 2/3 on Dataflow

Reply via email to