[
https://issues.apache.org/jira/browse/BEAM-11330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Beam JIRA Bot reassigned BEAM-11330:
------------------------------------
Assignee: (was: Pablo Estrada)
> BigQueryServicesImpl.insertAll evaluates maxRowBatchSize after a row is added
> to the batch
> ------------------------------------------------------------------------------------------
>
> Key: BEAM-11330
> URL: https://issues.apache.org/jira/browse/BEAM-11330
> Project: Beam
> Issue Type: Bug
> Components: io-java-gcp
> Affects Versions: 2.22.0, 2.23.0, 2.24.0, 2.25.0
> Reporter: Liam Haworth
> Priority: P3
> Labels: stale-assigned
>
> When using the {{BigQueryIO.Write}} transformation, a set of pipeline options
> defined in {{BigQueryOptions}} become available to the pipeline.
> Two of these options being:
> * {{maxStreamingRowsToBatch}} - "The maximum number of rows to batch in a
> single streaming insert to BigQuery."
> * {{maxStreamingBatchSize}} - "The maximum byte size of a single streaming
> insert to BigQuery"
> Reading the description of the {{maxStreamingBatchSize}}, I am given the
> impression that the BigQuery sink will ensure that each batch is either on,
> or under, the max byte size configured.
> But after [reviewing the code of the internal sink
> transformation|https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryServicesImpl.java#L826],
> I can see that the batching code will first add a row to the batch and then
> compares the new batch size against the maximum configured.
> The description of the option, {{maxStreamingBatchSize}}, gives the end user
> an impression that this will protect them from batches that will exceed the
> size limit of the BigQuery streaming inserts API.
> When in reality it can lead to a situation where a batch is produced that
> massively exceeds the limit and the transformation will get stuck into a loop
> of constantly retrying the request.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)