[ 
https://issues.apache.org/jira/browse/BEAM-8012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17122704#comment-17122704
 ] 

Beam JIRA Bot commented on BEAM-8012:
-------------------------------------

This issue is P2 but has been unassigned without any comment for 60 days so it 
has been labeled "stale-P2". If this issue is still affecting you, we care! 
Please comment and remove the label. Otherwise, in 14 days the issue will be 
moved to P3.

Please see https://beam.apache.org/contribute/jira-priorities/ for a detailed 
explanation of what these priorities mean.


> Perf improvements for Python WriteToBigQuery with Streaming Inserts
> -------------------------------------------------------------------
>
>                 Key: BEAM-8012
>                 URL: https://issues.apache.org/jira/browse/BEAM-8012
>             Project: Beam
>          Issue Type: Improvement
>          Components: io-py-gcp
>            Reporter: Pablo Estrada
>            Priority: P2
>              Labels: stale-P2
>
> Users have reported that for a pipeline that is able to process 400 
> msg/sec/cpu drops to 75 msg/sec/cpu when adding the WriteToBigQuery sink from 
> the Python SDK.
> Some candidates to be optimized:
>  * 
> [https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/bigquery.py#L776-L805]
>  - The GetTable method gets called, sometimes veeery often.
>  * 
> [https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/bigquery_tools.py#L1017-L1019]
>  - The RowAsDictJsonCoder does special treatment of bytes, and for that it 
> iterates through the whole record first.
>  * 
> [https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/bigquery.py#L823-L840]
>  - The batching strategy for the Writing DoFn may be improved?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to