[ 
https://issues.apache.org/jira/browse/BEAM-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16002132#comment-16002132
 ] 

Anant Bhandarkar edited comment on BEAM-2208 at 5/9/17 6:22 AM:
----------------------------------------------------------------

[~altay] This word count job was run yesterday.
2017-05-08_02_48_51-5929018952297525369

We tried to increase the number of worker instance to 50  instead of autoscale 
but it only took max 2 workers and took 34 min 54 sec to execute.

Wondering what will ensure that the work is distributed among the workers and 
also what will bring about such difference in execution times compared to Java 
in a simple word count scenario.


was (Author: [email protected]):
[~altay] This word count job was run yesterday.
2017-05-08_02_48_51-5929018952297525369

We tried to increase the number of worker instance to 50  instead of autoscale 
but it only took max 2 workers and took 34 min 54 sec to execute.

Wondering what will ensure that the work is distributed among the workers also 
what will bring about such difference in execution times compared to Java in a 
simple word count scenario.

> Python SDK wordcount on cloud Dataflow runner is slow
> -----------------------------------------------------
>
>                 Key: BEAM-2208
>                 URL: https://issues.apache.org/jira/browse/BEAM-2208
>             Project: Beam
>          Issue Type: Improvement
>          Components: runner-dataflow, sdk-py
>    Affects Versions: 0.6.0
>            Reporter: Anant Bhandarkar
>            Assignee: Ahmet Altay
>            Priority: Critical
>
> I have been trying to run the Beam Word count example with a 2GB file.
> When I run the Java Example for word count of this csv file the job gets 
> completed in 7.15secs Mins.
> Job ID        
> 2017-04-18_23_57_02-2832613177376293063
> But word count example with same file using Python SDK takes 28 to 35mins 
> 2017-04-20_04_48_27-8924552896141769408
> SDK version   
> Apache Beam SDK for Python 0.6.0



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to