Braden Bassingthwaite created BEAM-6117:
-------------------------------------------
Summary: Dataflow Slowness
Key: BEAM-6117
URL: https://issues.apache.org/jira/browse/BEAM-6117
Project: Beam
Issue Type: Bug
Components: sdk-go
Reporter: Braden Bassingthwaite
Assignee: Robert Burke
This is a pretty open ended ticket but we've been struggling with this for
quite some time and hoping we can get assistance in getting our issue resolved.
We wrote and contributed the datastore reader earlier this year and have been
using it in our project in a couple of scenarios with success. The problem that
we are facing is that our dataflows take a long time. We have datastore kinds
that are 100M+ and they take 2-3 days to go over. We've try fiddling with all
of the knobs available to us(datastore splits, cpus, turning off autoscaling,
scope changes, updating libraries, etc...) and can't seem to make it go faster.
My only hunch is that within the datastore reader when viewing the status in
dataflows ui. Is that we see:
Output collections
DailyListingScore/main.queryFn.out0
Elements added
–
Estimated size
–
I am assuming that these numbers would indicate to dataflow the progress that
the step is making and scale up/down dependent on these numbers. Is this
right? Or would these numbers have no bearing? We've tried starting the
dataflow with 32+ workers and it will always scale down to 1-2 nodes after a
couple of minutes. It seems as though dataflow isn't scaling up when it should.
Any directions or assistance in getting this issue solved would be great!
Thanks
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)