Sam Whittle created BEAM-12472:
----------------------------------
Summary: BigQuery streaming writes can be batched beyond request
limit with BatchAndInsertElements
Key: BEAM-12472
URL: https://issues.apache.org/jira/browse/BEAM-12472
Project: Beam
Issue Type: Bug
Components: io-java-gcp
Reporter: Sam Whittle
BatchAndInsertElements accumulates all the input elements and flushes them in
finishBundle.
However if there is enough data the request limit for bigquery can be exceeded
causing an exception like the following. It seems that finishBundle should
limit the # of rows and bytes and possibly flush multiple times for a
destination.
Work around would be to use autosharding which uses state that has batching
limits or to increase the # of streaming keys to decrease the likelihood of
hitting this.
"Error while processing a work item: UNKNOWN:
org.apache.beam.sdk.util.UserCodeException: java.lang.RuntimeException:
com.google.api.client.googleapis.json.GoogleJsonResponseException: 400 Bad
Request
POST
https://bigquery.googleapis.com/bigquery/v2/projects/google.com:clouddfe/datasets/nexmark_06090820455271/tables/nexmark_simple/insertAll?prettyPrint=false
{
"code" : 400,
"errors" : [ {
"domain" : "global",
"message" : "Request payload size exceeds the limit: 10485760 bytes.",
"reason" : "badRequest"
} ],
"message" : "Request payload size exceeds the limit: 10485760 bytes.",
"status" : "INVALID_ARGUMENT"
}
at
org.apache.beam.sdk.util.UserCodeException.wrap(UserCodeException.java:39)
at
org.apache.beam.sdk.io.gcp.bigquery.BatchedStreamingWrite$BatchAndInsertElements$DoFnInvoker.invokeFinishBundle(Unknown
Source)
at
org.apache.beam.fn.harness.FnApiDoFnRunner.finishBundle(FnApiDoFnRunner.java:1661)
--
This message was sent by Atlassian Jira
(v8.3.4#803005)