[ 
https://issues.apache.org/jira/browse/BEAM-6117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16696305#comment-16696305
 ] 

Braden Bassingthwaite commented on BEAM-6117:
---------------------------------------------

I've added some screenshots of a dataflow that starts out fast, but scales down 
and goes to a snails pace.

 

The custom counters indicates when a query split has started and ended, and the 
number of rows emitted. I sort of expect the number of started to be 1024 and 
for all query splits to be processing in parallel, but it only processes maybe 
4 or 5 at a time.

> Dataflow Slowness
> -----------------
>
>                 Key: BEAM-6117
>                 URL: https://issues.apache.org/jira/browse/BEAM-6117
>             Project: Beam
>          Issue Type: Bug
>          Components: sdk-go
>            Reporter: Braden Bassingthwaite
>            Assignee: Robert Burke
>            Priority: Major
>         Attachments: Screen Shot 2018-11-22 at 7.08.08 PM.png, Screen Shot 
> 2018-11-22 at 7.08.32 PM.png, Screen Shot 2018-11-22 at 7.11.33 PM.png
>
>
> This is a pretty open ended ticket but we've been struggling with this for 
> quite some time and hoping we can get assistance in getting our issue 
> resolved.
>  
> We wrote and contributed the datastore reader earlier this year and have been 
> using it in our project in a couple of scenarios with success. The problem 
> that we are facing is that our dataflows take a long time. We have datastore 
> kinds that are 100M+ and they take 2-3 days to go over. We've try fiddling 
> with all of the knobs available to us(datastore splits, cpus, turning off 
> autoscaling, scope changes, updating libraries, etc...) and can't seem to 
> make it go faster.
> My only hunch is that within the datastore reader when viewing the status in 
> dataflows ui. Is that we see:
> Output collections
> DailyListingScore/main.queryFn.out0
> Elements added 
> –
> Estimated size 
> –
> I am assuming that these numbers would indicate to dataflow the progress that 
> the step is making and scale up/down dependent on these numbers.  Is this 
> right? Or would these numbers have no bearing?  We've tried starting the 
> dataflow with 32+ workers and it will always scale down to 1-2 nodes after a 
> couple of minutes. It seems as though dataflow isn't scaling up when it 
> should. Any directions or assistance in getting this issue solved would be 
> great!
>  
> Thanks
>  
>  
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to