[
https://issues.apache.org/jira/browse/BEAM-7403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16997156#comment-16997156
]
Pavlo Pohrrebnyi commented on BEAM-7403:
----------------------------------------
Looks like that was a Dataflow Runner issue, and Google has resolved that
> BigQueryIO.Write does not autoscale correctly (idle workers)
> ------------------------------------------------------------
>
> Key: BEAM-7403
> URL: https://issues.apache.org/jira/browse/BEAM-7403
> Project: Beam
> Issue Type: Bug
> Components: io-java-gcp
> Reporter: Pavlo Pohrrebnyi
> Priority: Major
>
> Apache Beam version:
> 2.10
> JAVA SDK
> Dataflow GCP Staged
> Details:
> We have a streaming dataflow which ingests data into BigQuery (Streaming
> Inserts).
> We deploy a job with max number of workers = 40 and
> there is a huge backlog already (high watermark).
> When the dataflow starts it scales 0 -> 3 (from 0 to 3 workers)
> and starts ingesting with 12000 messages/sec rate.
> After 2 mins it scales 3 -> 40 to keep up with a backlog.
> After scaling up, the rate never goes higher than it was with 3 nodes (12000
> messages/sec).
> We have memory consumption metrics in Stackdriver; from them
> we see that the first 3 workers consume about 5GB of RAM and the rest 37
> workers
> consume about 0.2GB RAM. It appears that these autoscaled Nodes are idle?
> Importantly, they don’t add to Streaming Inserts process for BigQuery.
> Autoscaling in the other streaming pipelines we have works fine.
> It appears that this is related to BigQuery streaming inserts.
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)