[jira] [Commented] (BEAM-11521) BigQuery: add maxStreamingBatchSize for Python streaming inserts

Aaron Dallas (Jira) Fri, 02 Apr 2021 16:45:07 -0700


    [ 
https://issues.apache.org/jira/browse/BEAM-11521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17314113#comment-17314113
 ]


Aaron Dallas commented on BEAM-11521:
-------------------------------------

I am not the original author of this ticket but this issue is affecting the 
operation of my production streaming Dataflow system. The version of Beam used 
in this instance is apache_beam-2.26.0-cp38-cp38-manylinux1_x86_64.whl.

I am getting many instances of the error "Request payload size exceeds the 
limit: 10485760 bytes" during the WriteToBigQuery stage of the pipeline.

Examining each individual message did not result in any larger than this size 
(it showed that all were below 1k in size), but my attempts to shrink the size 
of the window (down to 15 seconds, triggering after 125 events) did not relieve 
the error.

Google Cloud recommended I add the option "maxStreamingBatchSize" and were 
informed that it does not exist in the Python Beam SDK.

> BigQuery: add maxStreamingBatchSize for Python streaming inserts
> ----------------------------------------------------------------
>
>                 Key: BEAM-11521
>                 URL: https://issues.apache.org/jira/browse/BEAM-11521
>             Project: Beam
>          Issue Type: Bug
>          Components: io-py-gcp
>            Reporter: Udi Meiri
>            Priority: P3
>
> Java SDK has maxStreamingBatchSize.
> Implementing something similar for Python would prevent 400 HTTP errors from 
> BigQuery when batches are too large. Instead, Beam should try to split 
> batches and log a warning (once) when trying to insert a single very large 
> row.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (BEAM-11521) BigQuery: add maxStreamingBatchSize for Python streaming inserts

Reply via email to