[jira] [Commented] (BEAM-6117) Dataflow Slowness

Robert Burke (JIRA) Thu, 22 Nov 2018 17:36:27 -0800


    [ 
https://issues.apache.org/jira/browse/BEAM-6117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16696309#comment-16696309
 ]


Robert Burke commented on BEAM-6117:
------------------------------------

Hi Braden!

Just a quick note, since I don't want to leave you hanging all weekend.
Thanks for your interest!

The Go SDK is still experimental at this time, and critically there's a
missing peice in Beam that prevents the issues you're describing, Portable
Splittable DoFns. Once that interface layer exists, some modifications to
your connector would need to be made so it can relate estimates and similar
to the runner (such as Dataflow).

Please see the roadmap for some more information:
https://beam.apache.org/roadmap/go-sdk/

Further, while pipelines written with the Apache Beam Go SDK can function
on Dataflow, it's not yet officially supported by that service.

Thanks for your interest in the Go SDK, and for your patience while these
last bits get into place.

Robert B

On Thu, Nov 22, 2018, 5:13 PM Braden Bassingthwaite (JIRA) <[email protected]>



> Dataflow Slowness
> -----------------
>
>                 Key: BEAM-6117
>                 URL: https://issues.apache.org/jira/browse/BEAM-6117
>             Project: Beam
>          Issue Type: Bug
>          Components: sdk-go
>            Reporter: Braden Bassingthwaite
>            Assignee: Robert Burke
>            Priority: Major
>         Attachments: Screen Shot 2018-11-22 at 7.08.08 PM.png, Screen Shot 
> 2018-11-22 at 7.08.32 PM.png, Screen Shot 2018-11-22 at 7.11.33 PM.png
>
>
> This is a pretty open ended ticket but we've been struggling with this for 
> quite some time and hoping we can get assistance in getting our issue 
> resolved.
>  
> We wrote and contributed the datastore reader earlier this year and have been 
> using it in our project in a couple of scenarios with success. The problem 
> that we are facing is that our dataflows take a long time. We have datastore 
> kinds that are 100M+ and they take 2-3 days to go over. We've try fiddling 
> with all of the knobs available to us(datastore splits, cpus, turning off 
> autoscaling, scope changes, updating libraries, etc...) and can't seem to 
> make it go faster.
> My only hunch is that within the datastore reader when viewing the status in 
> dataflows ui. Is that we see:
> Output collections
> DailyListingScore/main.queryFn.out0
> Elements added 
> –
> Estimated size 
> –
> I am assuming that these numbers would indicate to dataflow the progress that 
> the step is making and scale up/down dependent on these numbers.  Is this 
> right? Or would these numbers have no bearing?  We've tried starting the 
> dataflow with 32+ workers and it will always scale down to 1-2 nodes after a 
> couple of minutes. It seems as though dataflow isn't scaling up when it 
> should. Any directions or assistance in getting this issue solved would be 
> great!
>  
> Thanks
>  
>  
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (BEAM-6117) Dataflow Slowness

Reply via email to