[ 
https://issues.apache.org/jira/browse/BEAM-12472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-12472:
-----------------------------------
    Description: 
BatchAndInsertElements accumulates all the input elements and flushes them in 
finishBundle.
However if there is enough data the request limit for bigquery can be exceeded 
causing an exception like the following.  It seems that finishBundle should 
limit the # of rows and bytes and possibly flush multiple times for a 
destination.

Work around would be to use autosharding which uses state that has batching 
limits or to increase the # of streaming keys to decrease the likelihood of 
hitting this.

{code}
Error while processing a work item: UNKNOWN: 
org.apache.beam.sdk.util.UserCodeException: java.lang.RuntimeException: 
com.google.api.client.googleapis.json.GoogleJsonResponseException: 400 Bad 
Request
POST 
https://bigquery.googleapis.com/bigquery/v2/projects/google.com:clouddfe/datasets/nexmark_06090820455271/tables/nexmark_simple/insertAll?prettyPrint=false
{
  "code" : 400,
  "errors" : [ {
    "domain" : "global",
    "message" : "Request payload size exceeds the limit: 10485760 bytes.",
    "reason" : "badRequest"
  } ],
  "message" : "Request payload size exceeds the limit: 10485760 bytes.",
  "status" : "INVALID_ARGUMENT"
}
        at 
org.apache.beam.sdk.util.UserCodeException.wrap(UserCodeException.java:39)
        at 
org.apache.beam.sdk.io.gcp.bigquery.BatchedStreamingWrite$BatchAndInsertElements$DoFnInvoker.invokeFinishBundle(Unknown
 Source)
        at 
org.apache.beam.fn.harness.FnApiDoFnRunner.finishBundle(FnApiDoFnRunner.java:1661)
{code}

  was:
BatchAndInsertElements accumulates all the input elements and flushes them in 
finishBundle.
However if there is enough data the request limit for bigquery can be exceeded 
causing an exception like the following.  It seems that finishBundle should 
limit the # of rows and bytes and possibly flush multiple times for a 
destination.

Work around would be to use autosharding which uses state that has batching 
limits or to increase the # of streaming keys to decrease the likelihood of 
hitting this.

"Error while processing a work item: UNKNOWN: 
org.apache.beam.sdk.util.UserCodeException: java.lang.RuntimeException: 
com.google.api.client.googleapis.json.GoogleJsonResponseException: 400 Bad 
Request
POST 
https://bigquery.googleapis.com/bigquery/v2/projects/google.com:clouddfe/datasets/nexmark_06090820455271/tables/nexmark_simple/insertAll?prettyPrint=false
{
  "code" : 400,
  "errors" : [ {
    "domain" : "global",
    "message" : "Request payload size exceeds the limit: 10485760 bytes.",
    "reason" : "badRequest"
  } ],
  "message" : "Request payload size exceeds the limit: 10485760 bytes.",
  "status" : "INVALID_ARGUMENT"
}
        at 
org.apache.beam.sdk.util.UserCodeException.wrap(UserCodeException.java:39)
        at 
org.apache.beam.sdk.io.gcp.bigquery.BatchedStreamingWrite$BatchAndInsertElements$DoFnInvoker.invokeFinishBundle(Unknown
 Source)
        at 
org.apache.beam.fn.harness.FnApiDoFnRunner.finishBundle(FnApiDoFnRunner.java:1661)


> BigQuery streaming writes can be batched beyond request limit with 
> BatchAndInsertElements
> -----------------------------------------------------------------------------------------
>
>                 Key: BEAM-12472
>                 URL: https://issues.apache.org/jira/browse/BEAM-12472
>             Project: Beam
>          Issue Type: Bug
>          Components: io-java-gcp
>            Reporter: Sam Whittle
>            Priority: P2
>
> BatchAndInsertElements accumulates all the input elements and flushes them in 
> finishBundle.
> However if there is enough data the request limit for bigquery can be 
> exceeded causing an exception like the following.  It seems that finishBundle 
> should limit the # of rows and bytes and possibly flush multiple times for a 
> destination.
> Work around would be to use autosharding which uses state that has batching 
> limits or to increase the # of streaming keys to decrease the likelihood of 
> hitting this.
> {code}
> Error while processing a work item: UNKNOWN: 
> org.apache.beam.sdk.util.UserCodeException: java.lang.RuntimeException: 
> com.google.api.client.googleapis.json.GoogleJsonResponseException: 400 Bad 
> Request
> POST 
> https://bigquery.googleapis.com/bigquery/v2/projects/google.com:clouddfe/datasets/nexmark_06090820455271/tables/nexmark_simple/insertAll?prettyPrint=false
> {
>   "code" : 400,
>   "errors" : [ {
>     "domain" : "global",
>     "message" : "Request payload size exceeds the limit: 10485760 bytes.",
>     "reason" : "badRequest"
>   } ],
>   "message" : "Request payload size exceeds the limit: 10485760 bytes.",
>   "status" : "INVALID_ARGUMENT"
> }
>       at 
> org.apache.beam.sdk.util.UserCodeException.wrap(UserCodeException.java:39)
>       at 
> org.apache.beam.sdk.io.gcp.bigquery.BatchedStreamingWrite$BatchAndInsertElements$DoFnInvoker.invokeFinishBundle(Unknown
>  Source)
>       at 
> org.apache.beam.fn.harness.FnApiDoFnRunner.finishBundle(FnApiDoFnRunner.java:1661)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to