[
https://issues.apache.org/jira/browse/BEAM-8012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Pablo Estrada updated BEAM-8012:
--------------------------------
Status: Open (was: Triage Needed)
> Perf improvements for Python WriteToBigQuery with Streaming Inserts
> -------------------------------------------------------------------
>
> Key: BEAM-8012
> URL: https://issues.apache.org/jira/browse/BEAM-8012
> Project: Beam
> Issue Type: Improvement
> Components: io-py-gcp
> Reporter: Pablo Estrada
> Assignee: Tanay Tummalapalli
> Priority: Major
>
> Users have reported that for a pipeline that is able to process 400
> msg/sec/cpu drops to 75 msg/sec/cpu when adding the WriteToBigQuery sink from
> the Python SDK.
> Some candidates to be optimized:
> *
> [https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/bigquery.py#L776-L805]
> - The GetTable method gets called, sometimes veeery often.
> *
> [https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/bigquery_tools.py#L1017-L1019]
> - The RowAsDictJsonCoder does special treatment of bytes, and for that it
> iterates through the whole record first.
> *
> [https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/bigquery.py#L823-L840]
> - The batching strategy for the Writing DoFn may be improved?
--
This message was sent by Atlassian Jira
(v8.3.2#803003)