joyhaldar opened a new pull request, #15308:
URL: https://github.com/apache/iceberg/pull/15308

   **Description**
   
   Adds retry detection to BigQuery Metastore table commits to handle cases 
where a commit might have succeeded but we received an error, for example, a 
network timeout after the request reached the server.
   
   When retries occur during `create` or `update`, an earlier attempt may have 
actually succeeded. Without checking, we might incorrectly report failure or 
attempt duplicate commits. This addresses the [existing 
TODO](https://github.com/apache/iceberg/blob/444e381dc786d6b61d8325e9526b816f881307cc/bigquery/src/main/java/org/apache/iceberg/gcp/bigquery/BigQueryMetastoreClientImpl.java#L330)
 in the codebase.
   
   
   **Changes**
   - Added `RetryDetector` utility that wraps callables and counts invocations
   - Added overloaded `create(Table, RetryDetector)` and 
`update(TableReference, Table, RetryDetector)` methods to 
`BigQueryMetastoreClient`
   - Updated `BigQueryTableOperations.doCommit()` to use `checkCommitStatus` 
when needed
   - Extracted `cleanupMetadata()` helper method
   
   **Notes for Reviewers**
   
   1. I wanted to implement idempotency for BQMS tables during create/update, 
so I was checking other patterns and noticed that GlueCatalog has a 
RetryDetector mechanism. But that one is based on AWS MetricPublisher.
   2. Turns out, 
[BigQueryRetryHelper](https://github.com/googleapis/java-bigquery/blob/ad438dc626b69f3bddda47ccf6b7a97d9053c047/google-cloud-bigquery/src/main/java/com/google/cloud/bigquery/BigQueryRetryHelper.java)
 ([used in 
BQMS](https://github.com/apache/iceberg/blob/444e381dc786d6b61d8325e9526b816f881307cc/bigquery/src/main/java/org/apache/iceberg/gcp/bigquery/BigQueryMetastoreClientImpl.java#L49))
 does not return number of retries. So I wrote a custom `RetryDetector` which 
wraps the create and update callables to count invocations.
   3. Since I needed to pass the `RetryDetector` object to `create` and 
`update` calls, I added overloaded methods in 
[BigQueryMetastoreClient.java](https://github.com/apache/iceberg/blob/main/bigquery/src/main/java/org/apache/iceberg/gcp/bigquery/BigQueryMetastoreClient.java).
 The 1-arg versions delegate to the 2-arg versions.
   4. Only checking `RuntimeIOException` because 
[convertExceptionIfUnsuccessful()](https://github.com/apache/iceberg/blob/444e381dc786d6b61d8325e9526b816f881307cc/bigquery/src/main/java/org/apache/iceberg/gcp/bigquery/BigQueryMetastoreClientImpl.java#L621)
 already converts HTTP errors to specific Iceberg exceptions. 
`RuntimeIOException` wraps `IOException`, which are network failures where we 
don't know if the request reached the server. Other exceptions like 
`CommitFailedException` or `AlreadyExistsException` are definitive server 
responses, so no need to check commit status for those.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to