[jira] [Commented] (BEAM-14364) 404s in BigQueryIO don't get output to Failed Inserts PCollection

Darren Norton (Jira) Thu, 28 Apr 2022 21:21:07 -0700


    [ 
https://issues.apache.org/jira/browse/BEAM-14364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17529755#comment-17529755
 ]


Darren Norton commented on BEAM-14364:
--------------------------------------

Just as a note, this job was running in Apache Beam 2.33.0.

I wanted to furnish a bit more information from the initial ticket, but to 
start, here is a screenshot that contains the diagnostics for the job itself. 
!ErrorsInPrototypeJob.PNG!

I've been watching a few versions on Apache Beam and it seems like the 
ManagedChannel allocation site errors are fairly familiar when interacting with 
the various BigQueryServicesImpl endpoints, so I think an exception isn't being 
handled in there potentially?

Here is a stacktrace for the 404. This was one from my testing environment 
where I ran the job with a missing table.
{code:java}
java.lang.RuntimeException: 
com.google.api.client.googleapis.json.GoogleJsonResponseException: 404 Not 
Found POST 
https://bigquery.googleapis.com/bigquery/v2/projects/blz-d-gdp-telem-data-81/datasets/telem_atlas/tables/log/insertAll?prettyPrint=false
 { "code" : 404, "errors" : [ { "domain" : "global", "message" : "Not found: 
Table blz-d-gdp-telem-data-81:telem_atlas.log", "reason" : "notFound" } ], 
"message" : "Not found: Table blz-d-gdp-telem-data-81:telem_atlas.log", 
"status" : "NOT_FOUND" }

at 
org.apache.beam.sdk.io.gcp.bigquery.BigQueryServicesImpl$DatasetServiceImpl.insertAll
 ( org/apache.beam.sdk.io.gcp.bigquery/BigQueryServicesImpl.java:994 )
at 
org.apache.beam.sdk.io.gcp.bigquery.BigQueryServicesImpl$DatasetServiceImpl.insertAll
 ( org/apache.beam.sdk.io.gcp.bigquery/BigQueryServicesImpl.java:1047 )
at org.apache.beam.sdk.io.gcp.bigquery.BatchedStreamingWrite.flushRows ( 
org/apache.beam.sdk.io.gcp.bigquery/BatchedStreamingWrite.java:387 )
at org.apache.beam.sdk.io.gcp.bigquery.BatchedStreamingWrite.access$800 ( 
org/apache.beam.sdk.io.gcp.bigquery/BatchedStreamingWrite.java:72 )
at 
org.apache.beam.sdk.io.gcp.bigquery.BatchedStreamingWrite$InsertBatchedElements.processElement
 ( org/apache.beam.sdk.io.gcp.bigquery/BatchedStreamingWrite.java:350 )
Caused by: com.google.api.client.googleapis.json.GoogleJsonResponseException

at com.google.api.client.googleapis.json.GoogleJsonResponseException.from ( 
GoogleJsonResponseException.java:146 )
at 
com.google.api.client.googleapis.services.json.AbstractGoogleJsonClientRequest.newExceptionOnError
 ( AbstractGoogleJsonClientRequest.java:118 )
at 
com.google.api.client.googleapis.services.json.AbstractGoogleJsonClientRequest.newExceptionOnError
 ( AbstractGoogleJsonClientRequest.java:37 )
at 
com.google.api.client.googleapis.services.AbstractGoogleClientRequest$1.interceptResponse
 ( AbstractGoogleClientRequest.java:428 )
at com.google.api.client.http.HttpRequest.execute ( HttpRequest.java:1111 )
at 
com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed
 ( AbstractGoogleClientRequest.java:514 )
at 
com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed
 ( AbstractGoogleClientRequest.java:455 )
at 
com.google.api.client.googleapis.services.AbstractGoogleClientRequest.execute ( 
AbstractGoogleClientRequest.java:565 )
at 
org.apache.beam.sdk.io.gcp.bigquery.BigQueryServicesImpl$DatasetServiceImpl.lambda$insertAll$1
 ( BigQueryServicesImpl.java:903 )
at 
org.apache.beam.sdk.io.gcp.bigquery.BigQueryServicesImpl$BoundedExecutorService$SemaphoreCallable.call
 ( BigQueryServicesImpl.java:1560 )
at java.util.concurrent.FutureTask.run ( FutureTask.java:264 )
at java.util.concurrent.ThreadPoolExecutor.runWorker ( 
ThreadPoolExecutor.java:1128 )
at java.util.concurrent.ThreadPoolExecutor$Worker.run ( 
ThreadPoolExecutor.java:628 )
at java.lang.Thread.run ( Thread.java:834 ) {code}

> 404s in BigQueryIO don't get output to Failed Inserts PCollection
> -----------------------------------------------------------------
>
>                 Key: BEAM-14364
>                 URL: https://issues.apache.org/jira/browse/BEAM-14364
>             Project: Beam
>          Issue Type: Bug
>          Components: io-py-gcp
>            Reporter: Svetak Vihaan Sundhar
>            Assignee: Svetak Vihaan Sundhar
>            Priority: P1
>         Attachments: ErrorsInPrototypeJob.PNG
>
>
> Given that BigQueryIO is configured to use createDisposition(CREATE_NEVER),
> and the DynamicDestinations class returns "null" for a schema,
> and the table for that destination does not exist in BigQuery, When I stream 
> records to BigQuery for that table, then the write should fail,
> and the failed rows should appear on the output PCollection for Failed 
> Inserts (via getFailedInserts().
>  
> Almost all of the time, the table exists before hand, but given that new 
> tables can be created, we want this behavior to be non-explosive to the Job, 
> however, what we are seeing is that processing completely stops in those 
> pipelines, and eventually the jobs run out of memory. I feel that the 
> appropriate action when BigQuery 404's for the table, would be to submit 
> those failed TableRows to the output PCollection and continue processing as 
> normal.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Commented] (BEAM-14364) 404s in BigQueryIO don't get output to Failed Inserts PCollection

Reply via email to