[
https://issues.apache.org/jira/browse/BEAM-14364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17529755#comment-17529755
]
Darren Norton commented on BEAM-14364:
--------------------------------------
Just as a note, this job was running in Apache Beam 2.33.0.
I wanted to furnish a bit more information from the initial ticket, but to
start, here is a screenshot that contains the diagnostics for the job itself.
!ErrorsInPrototypeJob.PNG!
I've been watching a few versions on Apache Beam and it seems like the
ManagedChannel allocation site errors are fairly familiar when interacting with
the various BigQueryServicesImpl endpoints, so I think an exception isn't being
handled in there potentially?
Here is a stacktrace for the 404. This was one from my testing environment
where I ran the job with a missing table.
{code:java}
java.lang.RuntimeException:
com.google.api.client.googleapis.json.GoogleJsonResponseException: 404 Not
Found POST
https://bigquery.googleapis.com/bigquery/v2/projects/blz-d-gdp-telem-data-81/datasets/telem_atlas/tables/log/insertAll?prettyPrint=false
{ "code" : 404, "errors" : [ { "domain" : "global", "message" : "Not found:
Table blz-d-gdp-telem-data-81:telem_atlas.log", "reason" : "notFound" } ],
"message" : "Not found: Table blz-d-gdp-telem-data-81:telem_atlas.log",
"status" : "NOT_FOUND" }
at
org.apache.beam.sdk.io.gcp.bigquery.BigQueryServicesImpl$DatasetServiceImpl.insertAll
( org/apache.beam.sdk.io.gcp.bigquery/BigQueryServicesImpl.java:994 )
at
org.apache.beam.sdk.io.gcp.bigquery.BigQueryServicesImpl$DatasetServiceImpl.insertAll
( org/apache.beam.sdk.io.gcp.bigquery/BigQueryServicesImpl.java:1047 )
at org.apache.beam.sdk.io.gcp.bigquery.BatchedStreamingWrite.flushRows (
org/apache.beam.sdk.io.gcp.bigquery/BatchedStreamingWrite.java:387 )
at org.apache.beam.sdk.io.gcp.bigquery.BatchedStreamingWrite.access$800 (
org/apache.beam.sdk.io.gcp.bigquery/BatchedStreamingWrite.java:72 )
at
org.apache.beam.sdk.io.gcp.bigquery.BatchedStreamingWrite$InsertBatchedElements.processElement
( org/apache.beam.sdk.io.gcp.bigquery/BatchedStreamingWrite.java:350 )
Caused by: com.google.api.client.googleapis.json.GoogleJsonResponseException
at com.google.api.client.googleapis.json.GoogleJsonResponseException.from (
GoogleJsonResponseException.java:146 )
at
com.google.api.client.googleapis.services.json.AbstractGoogleJsonClientRequest.newExceptionOnError
( AbstractGoogleJsonClientRequest.java:118 )
at
com.google.api.client.googleapis.services.json.AbstractGoogleJsonClientRequest.newExceptionOnError
( AbstractGoogleJsonClientRequest.java:37 )
at
com.google.api.client.googleapis.services.AbstractGoogleClientRequest$1.interceptResponse
( AbstractGoogleClientRequest.java:428 )
at com.google.api.client.http.HttpRequest.execute ( HttpRequest.java:1111 )
at
com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed
( AbstractGoogleClientRequest.java:514 )
at
com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed
( AbstractGoogleClientRequest.java:455 )
at
com.google.api.client.googleapis.services.AbstractGoogleClientRequest.execute (
AbstractGoogleClientRequest.java:565 )
at
org.apache.beam.sdk.io.gcp.bigquery.BigQueryServicesImpl$DatasetServiceImpl.lambda$insertAll$1
( BigQueryServicesImpl.java:903 )
at
org.apache.beam.sdk.io.gcp.bigquery.BigQueryServicesImpl$BoundedExecutorService$SemaphoreCallable.call
( BigQueryServicesImpl.java:1560 )
at java.util.concurrent.FutureTask.run ( FutureTask.java:264 )
at java.util.concurrent.ThreadPoolExecutor.runWorker (
ThreadPoolExecutor.java:1128 )
at java.util.concurrent.ThreadPoolExecutor$Worker.run (
ThreadPoolExecutor.java:628 )
at java.lang.Thread.run ( Thread.java:834 ) {code}
> 404s in BigQueryIO don't get output to Failed Inserts PCollection
> -----------------------------------------------------------------
>
> Key: BEAM-14364
> URL: https://issues.apache.org/jira/browse/BEAM-14364
> Project: Beam
> Issue Type: Bug
> Components: io-py-gcp
> Reporter: Svetak Vihaan Sundhar
> Assignee: Svetak Vihaan Sundhar
> Priority: P1
> Attachments: ErrorsInPrototypeJob.PNG
>
>
> Given that BigQueryIO is configured to use createDisposition(CREATE_NEVER),
> and the DynamicDestinations class returns "null" for a schema,
> and the table for that destination does not exist in BigQuery, When I stream
> records to BigQuery for that table, then the write should fail,
> and the failed rows should appear on the output PCollection for Failed
> Inserts (via getFailedInserts().
>
> Almost all of the time, the table exists before hand, but given that new
> tables can be created, we want this behavior to be non-explosive to the Job,
> however, what we are seeing is that processing completely stops in those
> pipelines, and eventually the jobs run out of memory. I feel that the
> appropriate action when BigQuery 404's for the table, would be to submit
> those failed TableRows to the output PCollection and continue processing as
> normal.
--
This message was sent by Atlassian Jira
(v8.20.7#820007)