[
https://issues.apache.org/jira/browse/BEAM-14364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17530137#comment-17530137
]
Bruno Candido Volpato da Cunha commented on BEAM-14364:
-------------------------------------------------------
Same behavior as of 2.37.0, with slightly different stack traces.
The streaming job gets into a retry loop.
{code:java}
2022-04-27 14:27:11.569 EDT Execution of work for computation 'P0' on key
'acust-1:cust1.ingestion_stg' failed with uncaught exception. Work will be
retried locally. {code}
{code:java}
java.lang.RuntimeException:
com.google.api.client.googleapis.json.GoogleJsonResponseException: 404 Not Found
POST
https://bigquery.googleapis.com/bigquery/v2/projects/cust-1/datasets/cust1/tables/ingestion_stg/insertAll?prettyPrint=false
{
"code" : 404,
"errors" : [ {
"domain" : "global",
"message" : "Not found: Table cust-1:cust1.ingestion_stg",
"reason" : "notFound"
} ],
"message" : "Not found: Table cust-1:cust1.ingestion_stg",
"status" : "NOT_FOUND"
}
at
org.apache.beam.sdk.io.gcp.bigquery.BigQueryServicesImpl$DatasetServiceImpl.insertAll(BigQueryServicesImpl.java:1101)
at
org.apache.beam.sdk.io.gcp.bigquery.BigQueryServicesImpl$DatasetServiceImpl.insertAll(BigQueryServicesImpl.java:1154)
at
org.apache.beam.sdk.io.gcp.bigquery.BatchedStreamingWrite.flushRows(BatchedStreamingWrite.java:374)
at
org.apache.beam.sdk.io.gcp.bigquery.BatchedStreamingWrite.access$900(BatchedStreamingWrite.java:69)
at
org.apache.beam.sdk.io.gcp.bigquery.BatchedStreamingWrite$BatchAndInsertElements.finishBundle(BatchedStreamingWrite.java:263)
Caused by: com.google.api.client.googleapis.json.GoogleJsonResponseException:
404 Not Found
POST
https://bigquery.googleapis.com/bigquery/v2/projects/cust-1/datasets/cust1/tables/ingestion_stg/insertAll?prettyPrint=false
{
"code" : 404,
"errors" : [ {
"domain" : "global",
"message" : "Not found: Table cust-1:cust1.ingestion_stg",
"reason" : "notFound"
} ],
"message" : "Not found: Table cust-1:cust1.ingestion_stg",
"status" : "NOT_FOUND"
}
at
com.google.api.client.googleapis.json.GoogleJsonResponseException.from(GoogleJsonResponseException.java:146)
at
com.google.api.client.googleapis.services.json.AbstractGoogleJsonClientRequest.newExceptionOnError(AbstractGoogleJsonClientRequest.java:118)
at
com.google.api.client.googleapis.services.json.AbstractGoogleJsonClientRequest.newExceptionOnError(AbstractGoogleJsonClientRequest.java:37)
at
com.google.api.client.googleapis.services.AbstractGoogleClientRequest$1.interceptResponse(AbstractGoogleClientRequest.java:428)
at com.google.api.client.http.HttpRequest.execute(HttpRequest.java:1111)
at
com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:514)
at
com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:455)
at
com.google.api.client.googleapis.services.AbstractGoogleClientRequest.execute(AbstractGoogleClientRequest.java:565)
at
org.apache.beam.sdk.io.gcp.bigquery.BigQueryServicesImpl$DatasetServiceImpl$InsertBatchofRowsCallable.call(BigQueryServicesImpl.java:867)
at
org.apache.beam.sdk.io.gcp.bigquery.BigQueryServicesImpl$DatasetServiceImpl$InsertBatchofRowsCallable.call(BigQueryServicesImpl.java:818)
at
org.apache.beam.sdk.io.gcp.bigquery.BigQueryServicesImpl$BoundedExecutorService$SemaphoreCallable.call(BigQueryServicesImpl.java:1697)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:834){code}
> 404s in BigQueryIO don't get output to Failed Inserts PCollection
> -----------------------------------------------------------------
>
> Key: BEAM-14364
> URL: https://issues.apache.org/jira/browse/BEAM-14364
> Project: Beam
> Issue Type: Bug
> Components: io-py-gcp
> Reporter: Svetak Vihaan Sundhar
> Assignee: Svetak Vihaan Sundhar
> Priority: P1
> Attachments: ErrorsInPrototypeJob.PNG
>
>
> Given that BigQueryIO is configured to use createDisposition(CREATE_NEVER),
> and the DynamicDestinations class returns "null" for a schema,
> and the table for that destination does not exist in BigQuery, When I stream
> records to BigQuery for that table, then the write should fail,
> and the failed rows should appear on the output PCollection for Failed
> Inserts (via getFailedInserts().
>
> Almost all of the time, the table exists before hand, but given that new
> tables can be created, we want this behavior to be non-explosive to the Job,
> however, what we are seeing is that processing completely stops in those
> pipelines, and eventually the jobs run out of memory. I feel that the
> appropriate action when BigQuery 404's for the table, would be to submit
> those failed TableRows to the output PCollection and continue processing as
> normal.
--
This message was sent by Atlassian Jira
(v8.20.7#820007)