[
https://issues.apache.org/jira/browse/BEAM-11521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17314113#comment-17314113
]
Aaron Dallas commented on BEAM-11521:
-------------------------------------
I am not the original author of this ticket but this issue is affecting the
operation of my production streaming Dataflow system. The version of Beam used
in this instance isĀ apache_beam-2.26.0-cp38-cp38-manylinux1_x86_64.whl.
I am getting many instances of the errorĀ "Request payload size exceeds the
limit: 10485760 bytes" during the WriteToBigQuery stage of the pipeline.
Examining each individual message did not result in any larger than this size
(it showed that all were below 1k in size), but my attempts to shrink the size
of the window (down to 15 seconds, triggering after 125 events) did not relieve
the error.
Google Cloud recommended I add the option "maxStreamingBatchSize" and were
informed that it does not exist in the Python Beam SDK.
> BigQuery: add maxStreamingBatchSize for Python streaming inserts
> ----------------------------------------------------------------
>
> Key: BEAM-11521
> URL: https://issues.apache.org/jira/browse/BEAM-11521
> Project: Beam
> Issue Type: Bug
> Components: io-py-gcp
> Reporter: Udi Meiri
> Priority: P3
>
> Java SDK has maxStreamingBatchSize.
> Implementing something similar for Python would prevent 400 HTTP errors from
> BigQuery when batches are too large. Instead, Beam should try to split
> batches and log a warning (once) when trying to insert a single very large
> row.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)