Pavlo Pohrrebnyi created BEAM-7403:
--------------------------------------

             Summary: BigQueryIO.Write does not autoscale correctly (idle 
workers)
                 Key: BEAM-7403
                 URL: https://issues.apache.org/jira/browse/BEAM-7403
             Project: Beam
          Issue Type: Bug
          Components: sdk-java-core
            Reporter: Pavlo Pohrrebnyi


Apache Beam version:
2.10
JAVA SDK
Dataflow GCP Staged

Details:
We have a streaming dataflow which ingests data into BigQuery (Streaming 
Inserts).
We deploy a job with max number of workers = 40 and
there is a huge backlog already (high watermark).

When the dataflow starts it scales 0 -> 3 (from 0 to 3 workers)
and starts ingesting with 12000 messages/sec rate.
After 2 mins it scales 3 -> 40 to keep up with a backlog.
After scaling up, the rate never goes higher than it was with 3 nodes (12000 
messages/sec).

We have memory consumption metrics in Stackdriver; from them
we see that the first 3 workers consume about 5GB of RAM and the rest 37 workers
consume about 0.2GB RAM. It appears that these autoscaled Nodes are idle? 
Importantly, they don’t add to Streaming Inserts process for BigQuery.

Autoscaling in the other streaming pipelines we have works fine.
It appears that this is related to BigQuery streaming inserts.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to