Sam Whittle created BEAM-12472:
----------------------------------

             Summary: BigQuery streaming writes can be batched beyond request 
limit with BatchAndInsertElements
                 Key: BEAM-12472
                 URL: https://issues.apache.org/jira/browse/BEAM-12472
             Project: Beam
          Issue Type: Bug
          Components: io-java-gcp
            Reporter: Sam Whittle


BatchAndInsertElements accumulates all the input elements and flushes them in 
finishBundle.
However if there is enough data the request limit for bigquery can be exceeded 
causing an exception like the following.  It seems that finishBundle should 
limit the # of rows and bytes and possibly flush multiple times for a 
destination.

Work around would be to use autosharding which uses state that has batching 
limits or to increase the # of streaming keys to decrease the likelihood of 
hitting this.

"Error while processing a work item: UNKNOWN: 
org.apache.beam.sdk.util.UserCodeException: java.lang.RuntimeException: 
com.google.api.client.googleapis.json.GoogleJsonResponseException: 400 Bad 
Request
POST 
https://bigquery.googleapis.com/bigquery/v2/projects/google.com:clouddfe/datasets/nexmark_06090820455271/tables/nexmark_simple/insertAll?prettyPrint=false
{
  "code" : 400,
  "errors" : [ {
    "domain" : "global",
    "message" : "Request payload size exceeds the limit: 10485760 bytes.",
    "reason" : "badRequest"
  } ],
  "message" : "Request payload size exceeds the limit: 10485760 bytes.",
  "status" : "INVALID_ARGUMENT"
}
        at 
org.apache.beam.sdk.util.UserCodeException.wrap(UserCodeException.java:39)
        at 
org.apache.beam.sdk.io.gcp.bigquery.BatchedStreamingWrite$BatchAndInsertElements$DoFnInvoker.invokeFinishBundle(Unknown
 Source)
        at 
org.apache.beam.fn.harness.FnApiDoFnRunner.finishBundle(FnApiDoFnRunner.java:1661)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to