joyhaldar opened a new pull request, #15308: URL: https://github.com/apache/iceberg/pull/15308
**Description** Adds retry detection to BigQuery Metastore table commits to handle cases where a commit might have succeeded but we received an error, for example, a network timeout after the request reached the server. When retries occur during `create` or `update`, an earlier attempt may have actually succeeded. Without checking, we might incorrectly report failure or attempt duplicate commits. This addresses the [existing TODO](https://github.com/apache/iceberg/blob/444e381dc786d6b61d8325e9526b816f881307cc/bigquery/src/main/java/org/apache/iceberg/gcp/bigquery/BigQueryMetastoreClientImpl.java#L330) in the codebase. **Changes** - Added `RetryDetector` utility that wraps callables and counts invocations - Added overloaded `create(Table, RetryDetector)` and `update(TableReference, Table, RetryDetector)` methods to `BigQueryMetastoreClient` - Updated `BigQueryTableOperations.doCommit()` to use `checkCommitStatus` when needed - Extracted `cleanupMetadata()` helper method **Notes for Reviewers** 1. I wanted to implement idempotency for BQMS tables during create/update, so I was checking other patterns and noticed that GlueCatalog has a RetryDetector mechanism. But that one is based on AWS MetricPublisher. 2. Turns out, [BigQueryRetryHelper](https://github.com/googleapis/java-bigquery/blob/ad438dc626b69f3bddda47ccf6b7a97d9053c047/google-cloud-bigquery/src/main/java/com/google/cloud/bigquery/BigQueryRetryHelper.java) ([used in BQMS](https://github.com/apache/iceberg/blob/444e381dc786d6b61d8325e9526b816f881307cc/bigquery/src/main/java/org/apache/iceberg/gcp/bigquery/BigQueryMetastoreClientImpl.java#L49)) does not return number of retries. So I wrote a custom `RetryDetector` which wraps the create and update callables to count invocations. 3. Since I needed to pass the `RetryDetector` object to `create` and `update` calls, I added overloaded methods in [BigQueryMetastoreClient.java](https://github.com/apache/iceberg/blob/main/bigquery/src/main/java/org/apache/iceberg/gcp/bigquery/BigQueryMetastoreClient.java). The 1-arg versions delegate to the 2-arg versions. 4. Only checking `RuntimeIOException` because [convertExceptionIfUnsuccessful()](https://github.com/apache/iceberg/blob/444e381dc786d6b61d8325e9526b816f881307cc/bigquery/src/main/java/org/apache/iceberg/gcp/bigquery/BigQueryMetastoreClientImpl.java#L621) already converts HTTP errors to specific Iceberg exceptions. `RuntimeIOException` wraps `IOException`, which are network failures where we don't know if the request reached the server. Other exceptions like `CommitFailedException` or `AlreadyExistsException` are definitive server responses, so no need to check commit status for those. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
