[ 
https://issues.apache.org/jira/browse/BEAM-14364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17530137#comment-17530137
 ] 

Bruno Candido Volpato da Cunha commented on BEAM-14364:
-------------------------------------------------------

Same behavior as of 2.37.0, with slightly different stack traces.

The streaming job gets into a retry loop.
{code:java}
2022-04-27 14:27:11.569 EDT Execution of work for computation 'P0' on key 
'acust-1:cust1.ingestion_stg'  failed with uncaught exception. Work will be 
retried locally. {code}
 
{code:java}
java.lang.RuntimeException: 
com.google.api.client.googleapis.json.GoogleJsonResponseException: 404 Not Found
POST 
https://bigquery.googleapis.com/bigquery/v2/projects/cust-1/datasets/cust1/tables/ingestion_stg/insertAll?prettyPrint=false
{
  "code" : 404,
  "errors" : [ {
    "domain" : "global",
    "message" : "Not found: Table cust-1:cust1.ingestion_stg",
    "reason" : "notFound"
  } ],
  "message" : "Not found: Table cust-1:cust1.ingestion_stg",
  "status" : "NOT_FOUND"
}
        at 
org.apache.beam.sdk.io.gcp.bigquery.BigQueryServicesImpl$DatasetServiceImpl.insertAll(BigQueryServicesImpl.java:1101)
        at 
org.apache.beam.sdk.io.gcp.bigquery.BigQueryServicesImpl$DatasetServiceImpl.insertAll(BigQueryServicesImpl.java:1154)
        at 
org.apache.beam.sdk.io.gcp.bigquery.BatchedStreamingWrite.flushRows(BatchedStreamingWrite.java:374)
        at 
org.apache.beam.sdk.io.gcp.bigquery.BatchedStreamingWrite.access$900(BatchedStreamingWrite.java:69)
        at 
org.apache.beam.sdk.io.gcp.bigquery.BatchedStreamingWrite$BatchAndInsertElements.finishBundle(BatchedStreamingWrite.java:263)
Caused by: com.google.api.client.googleapis.json.GoogleJsonResponseException: 
404 Not Found
POST 
https://bigquery.googleapis.com/bigquery/v2/projects/cust-1/datasets/cust1/tables/ingestion_stg/insertAll?prettyPrint=false
{
  "code" : 404,
  "errors" : [ {
    "domain" : "global",
    "message" : "Not found: Table cust-1:cust1.ingestion_stg",
    "reason" : "notFound"
  } ],
  "message" : "Not found: Table cust-1:cust1.ingestion_stg",
  "status" : "NOT_FOUND"
}
        at 
com.google.api.client.googleapis.json.GoogleJsonResponseException.from(GoogleJsonResponseException.java:146)
        at 
com.google.api.client.googleapis.services.json.AbstractGoogleJsonClientRequest.newExceptionOnError(AbstractGoogleJsonClientRequest.java:118)
        at 
com.google.api.client.googleapis.services.json.AbstractGoogleJsonClientRequest.newExceptionOnError(AbstractGoogleJsonClientRequest.java:37)
        at 
com.google.api.client.googleapis.services.AbstractGoogleClientRequest$1.interceptResponse(AbstractGoogleClientRequest.java:428)
        at com.google.api.client.http.HttpRequest.execute(HttpRequest.java:1111)
        at 
com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:514)
        at 
com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:455)
        at 
com.google.api.client.googleapis.services.AbstractGoogleClientRequest.execute(AbstractGoogleClientRequest.java:565)
        at 
org.apache.beam.sdk.io.gcp.bigquery.BigQueryServicesImpl$DatasetServiceImpl$InsertBatchofRowsCallable.call(BigQueryServicesImpl.java:867)
        at 
org.apache.beam.sdk.io.gcp.bigquery.BigQueryServicesImpl$DatasetServiceImpl$InsertBatchofRowsCallable.call(BigQueryServicesImpl.java:818)
        at 
org.apache.beam.sdk.io.gcp.bigquery.BigQueryServicesImpl$BoundedExecutorService$SemaphoreCallable.call(BigQueryServicesImpl.java:1697)
        at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
        at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
        at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        at java.base/java.lang.Thread.run(Thread.java:834){code}
 

 

> 404s in BigQueryIO don't get output to Failed Inserts PCollection
> -----------------------------------------------------------------
>
>                 Key: BEAM-14364
>                 URL: https://issues.apache.org/jira/browse/BEAM-14364
>             Project: Beam
>          Issue Type: Bug
>          Components: io-py-gcp
>            Reporter: Svetak Vihaan Sundhar
>            Assignee: Svetak Vihaan Sundhar
>            Priority: P1
>         Attachments: ErrorsInPrototypeJob.PNG
>
>
> Given that BigQueryIO is configured to use createDisposition(CREATE_NEVER),
> and the DynamicDestinations class returns "null" for a schema,
> and the table for that destination does not exist in BigQuery, When I stream 
> records to BigQuery for that table, then the write should fail,
> and the failed rows should appear on the output PCollection for Failed 
> Inserts (via getFailedInserts().
>  
> Almost all of the time, the table exists before hand, but given that new 
> tables can be created, we want this behavior to be non-explosive to the Job, 
> however, what we are seeing is that processing completely stops in those 
> pipelines, and eventually the jobs run out of memory. I feel that the 
> appropriate action when BigQuery 404's for the table, would be to submit 
> those failed TableRows to the output PCollection and continue processing as 
> normal.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

Reply via email to