[jira] [Commented] (BEAM-2208) Apache Beam Python SDK is atleast 5 times slower

Anant Bhandarkar (JIRA) Mon, 08 May 2017 09:04:40 -0700

    [ 
https://issues.apache.org/jira/browse/BEAM-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16000987#comment-16000987
 ]


Anant Bhandarkar commented on BEAM-2208:
----------------------------------------

It is a uncompressed file. We had used the same uncompressed file with Java SDK 
which took 7mins but 35Mins for Python. I am wondering why is it 5 times slower?
Compressing it will make it slower or faster?

Python is not a speedy language but 5 times slower is quite a lot.

> Apache Beam Python SDK is atleast 5 times slower
> ------------------------------------------------
>
>                 Key: BEAM-2208
>                 URL: https://issues.apache.org/jira/browse/BEAM-2208
>             Project: Beam
>          Issue Type: Improvement
>          Components: runner-dataflow, sdk-py
>    Affects Versions: 0.6.0
>            Reporter: Anant Bhandarkar
>            Assignee: Ahmet Altay
>            Priority: Critical
>
> I have been trying to run the Beam Word count example with a 2GB file.
> When I run the Java Example for word count of this csv file the job gets 
> completed in 7.15secs Mins.
> Job ID        
> 2017-04-18_23_57_02-2832613177376293063
> But word count example with same file using Python SDK takes 28 to 35mins 
> 2017-04-20_04_48_27-8924552896141769408
> SDK version   
> Apache Beam SDK for Python 0.6.0



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (BEAM-2208) Apache Beam Python SDK is atleast 5 times slower

Reply via email to