[
https://issues.apache.org/jira/browse/BEAM-12472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Kenneth Knowles updated BEAM-12472:
-----------------------------------
Description:
BatchAndInsertElements accumulates all the input elements and flushes them in
finishBundle.
However if there is enough data the request limit for bigquery can be exceeded
causing an exception like the following. It seems that finishBundle should
limit the # of rows and bytes and possibly flush multiple times for a
destination.
Work around would be to use autosharding which uses state that has batching
limits or to increase the # of streaming keys to decrease the likelihood of
hitting this.
{code}
Error while processing a work item: UNKNOWN:
org.apache.beam.sdk.util.UserCodeException: java.lang.RuntimeException:
com.google.api.client.googleapis.json.GoogleJsonResponseException: 400 Bad
Request
POST
https://bigquery.googleapis.com/bigquery/v2/projects/google.com:clouddfe/datasets/nexmark_06090820455271/tables/nexmark_simple/insertAll?prettyPrint=false
{
"code" : 400,
"errors" : [ {
"domain" : "global",
"message" : "Request payload size exceeds the limit: 10485760 bytes.",
"reason" : "badRequest"
} ],
"message" : "Request payload size exceeds the limit: 10485760 bytes.",
"status" : "INVALID_ARGUMENT"
}
at
org.apache.beam.sdk.util.UserCodeException.wrap(UserCodeException.java:39)
at
org.apache.beam.sdk.io.gcp.bigquery.BatchedStreamingWrite$BatchAndInsertElements$DoFnInvoker.invokeFinishBundle(Unknown
Source)
at
org.apache.beam.fn.harness.FnApiDoFnRunner.finishBundle(FnApiDoFnRunner.java:1661)
{code}
was:
BatchAndInsertElements accumulates all the input elements and flushes them in
finishBundle.
However if there is enough data the request limit for bigquery can be exceeded
causing an exception like the following. It seems that finishBundle should
limit the # of rows and bytes and possibly flush multiple times for a
destination.
Work around would be to use autosharding which uses state that has batching
limits or to increase the # of streaming keys to decrease the likelihood of
hitting this.
"Error while processing a work item: UNKNOWN:
org.apache.beam.sdk.util.UserCodeException: java.lang.RuntimeException:
com.google.api.client.googleapis.json.GoogleJsonResponseException: 400 Bad
Request
POST
https://bigquery.googleapis.com/bigquery/v2/projects/google.com:clouddfe/datasets/nexmark_06090820455271/tables/nexmark_simple/insertAll?prettyPrint=false
{
"code" : 400,
"errors" : [ {
"domain" : "global",
"message" : "Request payload size exceeds the limit: 10485760 bytes.",
"reason" : "badRequest"
} ],
"message" : "Request payload size exceeds the limit: 10485760 bytes.",
"status" : "INVALID_ARGUMENT"
}
at
org.apache.beam.sdk.util.UserCodeException.wrap(UserCodeException.java:39)
at
org.apache.beam.sdk.io.gcp.bigquery.BatchedStreamingWrite$BatchAndInsertElements$DoFnInvoker.invokeFinishBundle(Unknown
Source)
at
org.apache.beam.fn.harness.FnApiDoFnRunner.finishBundle(FnApiDoFnRunner.java:1661)
> BigQuery streaming writes can be batched beyond request limit with
> BatchAndInsertElements
> -----------------------------------------------------------------------------------------
>
> Key: BEAM-12472
> URL: https://issues.apache.org/jira/browse/BEAM-12472
> Project: Beam
> Issue Type: Bug
> Components: io-java-gcp
> Reporter: Sam Whittle
> Priority: P2
>
> BatchAndInsertElements accumulates all the input elements and flushes them in
> finishBundle.
> However if there is enough data the request limit for bigquery can be
> exceeded causing an exception like the following. It seems that finishBundle
> should limit the # of rows and bytes and possibly flush multiple times for a
> destination.
> Work around would be to use autosharding which uses state that has batching
> limits or to increase the # of streaming keys to decrease the likelihood of
> hitting this.
> {code}
> Error while processing a work item: UNKNOWN:
> org.apache.beam.sdk.util.UserCodeException: java.lang.RuntimeException:
> com.google.api.client.googleapis.json.GoogleJsonResponseException: 400 Bad
> Request
> POST
> https://bigquery.googleapis.com/bigquery/v2/projects/google.com:clouddfe/datasets/nexmark_06090820455271/tables/nexmark_simple/insertAll?prettyPrint=false
> {
> "code" : 400,
> "errors" : [ {
> "domain" : "global",
> "message" : "Request payload size exceeds the limit: 10485760 bytes.",
> "reason" : "badRequest"
> } ],
> "message" : "Request payload size exceeds the limit: 10485760 bytes.",
> "status" : "INVALID_ARGUMENT"
> }
> at
> org.apache.beam.sdk.util.UserCodeException.wrap(UserCodeException.java:39)
> at
> org.apache.beam.sdk.io.gcp.bigquery.BatchedStreamingWrite$BatchAndInsertElements$DoFnInvoker.invokeFinishBundle(Unknown
> Source)
> at
> org.apache.beam.fn.harness.FnApiDoFnRunner.finishBundle(FnApiDoFnRunner.java:1661)
> {code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)