yu-iskw opened a new issue, #31226:
URL: https://github.com/apache/beam/issues/31226

   ### What happened?
   
   ## Overview
   
   Our Dataflow pipeline is designed to transfer data from Google Pub/Sub to 
BigQuery. It utilizes a custom dynamic destination to dynamically determine the 
target BigQuery table based on the JSON content of the Pub/Sub message.
   
   We have identified a recurring issue where the pipeline encounters a failure 
if the specified destination table is absent in BigQuery, resulting in a 404 
Not Found error. The expected behavior is for the pipeline to manage such 
errors gracefully and attempt retries according to the defined retry policy.
   
   Despite configuring the pipeline with `InsertRetryPolicy.neverRetry()`, it 
continues to terminate with the same error upon encountering a non-existent 
table.
   
   ## Environment
   
   - Apache Beam SDK in Java: 2.56.0
   
   ## Desired behavior
   
   We aim to customize error handling and retry operations according to a 
specified retry policy. For example, we could implement a custom retry policy 
that specifically avoids retrying operations that result in a 404 Not Found 
error.
   
   ## Sample code
   
   Here is a snippet of the code that configures the pipeline to write to 
BigQuery:
   
   ```java
   WriteResult writeResult = 
convertedTableRows.get(TRANSFORM_OUT).apply("WriteSuccessfulRecords",
                   BigQueryIO.writeTableRows().withoutValidation()
                                   
.withCreateDisposition(CreateDisposition.CREATE_NEVER)
                                   
.withWriteDisposition(WriteDisposition.WRITE_APPEND)
                                   .withExtendedErrorInfo()
                                   
.withMethod(BigQueryIO.Write.Method.STREAMING_INSERTS)
                                   
.withFailedInsertRetryPolicy(InsertRetryPolicy.neverRetry())
                                   .ignoreUnknownValues()
                                   .to(new CustomBigQueryDynamicDestination()));
   ```
   
   ## Error message
   
   We have masked sensitive information like the project ID and dataset ID in 
the error message below:
   
   ```shell
   Error message from worker: java.lang.RuntimeException: 
com.google.api.client.googleapis.json.GoogleJsonResponseException: 404 Not Found
   POST 
https://bigquery.googleapis.com/bigquery/v2/projects/[PROJECT_ID]/datasets/[DATASET_ID]/tables/no_such_table/insertAll?prettyPrint=false
   {
     "code" : 404,
     "errors" : [ {
       "domain" : "global",
       "message" : "Not found: Table [PROJECT_ID]:[DATASET_ID].no_such_table",
       "reason" : "notFound"
     } ],
     "message" : "Not found: Table [PROJECT_ID]:[DATASET_ID].no_such_table",
     "status" : "NOT_FOUND"
   }
           
org.apache.beam.sdk.io.gcp.bigquery.BigQueryServicesImpl$DatasetServiceImpl.insertAll(BigQueryServicesImpl.java:1240)
           
org.apache.beam.sdk.io.gcp.bigquery.BigQueryServicesImpl$DatasetServiceImpl.insertAll(BigQueryServicesImpl.java:1303)
           
org.apache.beam.sdk.io.gcp.bigquery.BatchedStreamingWrite.flushRows(BatchedStreamingWrite.java:403)
           
org.apache.beam.sdk.io.gcp.bigquery.BatchedStreamingWrite.access$900(BatchedStreamingWrite.java:67)
           
org.apache.beam.sdk.io.gcp.bigquery.BatchedStreamingWrite$BatchAndInsertElements.finishBundle(BatchedStreamingWrite.java:286)
   Caused by: 
com.google.api.client.googleapis.json.GoogleJsonResponseException: 404 Not Found
   POST 
https://bigquery.googleapis.com/bigquery/v2/projects/[PROJECT_ID]/datasets/[DATASET_ID]/tables/no_such_table/insertAll?prettyPrint=false
   {
     "code" : 404,
     "errors" : [ {
       "domain" : "global",
       "message" : "Not found: Table [PROJECT_ID]:[DATASET_ID].no_such_table",
       "reason" : "notFound"
     } ],
     "message" : "Not found: Table [PROJECT_ID]:[DATASET_ID].no_such_table",
     "status" : "NOT_FOUND"
   }
           
com.google.api.client.googleapis.json.GoogleJsonResponseException.from(GoogleJsonResponseException.java:146)
           
com.google.api.client.googleapis.services.json.AbstractGoogleJsonClientRequest.newExceptionOnError(AbstractGoogleJsonClientRequest.java:118)
           
com.google.api.client.googleapis.services.json.AbstractGoogleJsonClientRequest.newExceptionOnError(AbstractGoogleJsonClientRequest.java:37)
           
com.google.api.client.googleapis.services.AbstractGoogleClientRequest$1.interceptResponse(AbstractGoogleClientRequest.java:439)
           com.google.api.client.http.HttpRequest.execute(HttpRequest.java:1111)
           
com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRe…
 [message truncated due to size]
   ```
   
   
   ### Issue Priority
   
   Priority: 3 (minor)
   
   ### Issue Components
   
   - [ ] Component: Python SDK
   - [X] Component: Java SDK
   - [ ] Component: Go SDK
   - [ ] Component: Typescript SDK
   - [ ] Component: IO connector
   - [ ] Component: Beam YAML
   - [ ] Component: Beam examples
   - [ ] Component: Beam playground
   - [ ] Component: Beam katas
   - [ ] Component: Website
   - [ ] Component: Spark Runner
   - [ ] Component: Flink Runner
   - [ ] Component: Samza Runner
   - [ ] Component: Twister2 Runner
   - [ ] Component: Hazelcast Jet Runner
   - [ ] Component: Google Cloud Dataflow Runner


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to