yu-iskw opened a new issue, #31226:
URL: https://github.com/apache/beam/issues/31226
### What happened?
## Overview
Our Dataflow pipeline is designed to transfer data from Google Pub/Sub to
BigQuery. It utilizes a custom dynamic destination to dynamically determine the
target BigQuery table based on the JSON content of the Pub/Sub message.
We have identified a recurring issue where the pipeline encounters a failure
if the specified destination table is absent in BigQuery, resulting in a 404
Not Found error. The expected behavior is for the pipeline to manage such
errors gracefully and attempt retries according to the defined retry policy.
Despite configuring the pipeline with `InsertRetryPolicy.neverRetry()`, it
continues to terminate with the same error upon encountering a non-existent
table.
## Environment
- Apache Beam SDK in Java: 2.56.0
## Desired behavior
We aim to customize error handling and retry operations according to a
specified retry policy. For example, we could implement a custom retry policy
that specifically avoids retrying operations that result in a 404 Not Found
error.
## Sample code
Here is a snippet of the code that configures the pipeline to write to
BigQuery:
```java
WriteResult writeResult =
convertedTableRows.get(TRANSFORM_OUT).apply("WriteSuccessfulRecords",
BigQueryIO.writeTableRows().withoutValidation()
.withCreateDisposition(CreateDisposition.CREATE_NEVER)
.withWriteDisposition(WriteDisposition.WRITE_APPEND)
.withExtendedErrorInfo()
.withMethod(BigQueryIO.Write.Method.STREAMING_INSERTS)
.withFailedInsertRetryPolicy(InsertRetryPolicy.neverRetry())
.ignoreUnknownValues()
.to(new CustomBigQueryDynamicDestination()));
```
## Error message
We have masked sensitive information like the project ID and dataset ID in
the error message below:
```shell
Error message from worker: java.lang.RuntimeException:
com.google.api.client.googleapis.json.GoogleJsonResponseException: 404 Not Found
POST
https://bigquery.googleapis.com/bigquery/v2/projects/[PROJECT_ID]/datasets/[DATASET_ID]/tables/no_such_table/insertAll?prettyPrint=false
{
"code" : 404,
"errors" : [ {
"domain" : "global",
"message" : "Not found: Table [PROJECT_ID]:[DATASET_ID].no_such_table",
"reason" : "notFound"
} ],
"message" : "Not found: Table [PROJECT_ID]:[DATASET_ID].no_such_table",
"status" : "NOT_FOUND"
}
org.apache.beam.sdk.io.gcp.bigquery.BigQueryServicesImpl$DatasetServiceImpl.insertAll(BigQueryServicesImpl.java:1240)
org.apache.beam.sdk.io.gcp.bigquery.BigQueryServicesImpl$DatasetServiceImpl.insertAll(BigQueryServicesImpl.java:1303)
org.apache.beam.sdk.io.gcp.bigquery.BatchedStreamingWrite.flushRows(BatchedStreamingWrite.java:403)
org.apache.beam.sdk.io.gcp.bigquery.BatchedStreamingWrite.access$900(BatchedStreamingWrite.java:67)
org.apache.beam.sdk.io.gcp.bigquery.BatchedStreamingWrite$BatchAndInsertElements.finishBundle(BatchedStreamingWrite.java:286)
Caused by:
com.google.api.client.googleapis.json.GoogleJsonResponseException: 404 Not Found
POST
https://bigquery.googleapis.com/bigquery/v2/projects/[PROJECT_ID]/datasets/[DATASET_ID]/tables/no_such_table/insertAll?prettyPrint=false
{
"code" : 404,
"errors" : [ {
"domain" : "global",
"message" : "Not found: Table [PROJECT_ID]:[DATASET_ID].no_such_table",
"reason" : "notFound"
} ],
"message" : "Not found: Table [PROJECT_ID]:[DATASET_ID].no_such_table",
"status" : "NOT_FOUND"
}
com.google.api.client.googleapis.json.GoogleJsonResponseException.from(GoogleJsonResponseException.java:146)
com.google.api.client.googleapis.services.json.AbstractGoogleJsonClientRequest.newExceptionOnError(AbstractGoogleJsonClientRequest.java:118)
com.google.api.client.googleapis.services.json.AbstractGoogleJsonClientRequest.newExceptionOnError(AbstractGoogleJsonClientRequest.java:37)
com.google.api.client.googleapis.services.AbstractGoogleClientRequest$1.interceptResponse(AbstractGoogleClientRequest.java:439)
com.google.api.client.http.HttpRequest.execute(HttpRequest.java:1111)
com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRe…
[message truncated due to size]
```
### Issue Priority
Priority: 3 (minor)
### Issue Components
- [ ] Component: Python SDK
- [X] Component: Java SDK
- [ ] Component: Go SDK
- [ ] Component: Typescript SDK
- [ ] Component: IO connector
- [ ] Component: Beam YAML
- [ ] Component: Beam examples
- [ ] Component: Beam playground
- [ ] Component: Beam katas
- [ ] Component: Website
- [ ] Component: Spark Runner
- [ ] Component: Flink Runner
- [ ] Component: Samza Runner
- [ ] Component: Twister2 Runner
- [ ] Component: Hazelcast Jet Runner
- [ ] Component: Google Cloud Dataflow Runner
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]