503 is throttling on many (but not all) of the AWS services (s3, lambda, redshift..) - though not dynamodb or others that return 400 + some message.
If aws were to deploy an endpoint, they'd have to decide what to do. or even better, the spec could say "there is a specific error code for throttling, "429: Too Many Requests" to be returned in such a situation. On Sat, 2 Aug 2025 at 03:32, Dennis Huo <huoi...@gmail.com> wrote: > PR LGTM, thanks for following up on this! > > Reiterating my comments in the PR, IMO especially now that Russell's > https://github.com/apache/iceberg/pull/13449 is merged so that all the > read-only methods can still perform low-level retries, this change makes > sense as a safer default for commit behaviors. > > I could imagine a few potential follow-up features if people want to > achieve better availability in situations where more out-of-band knowledge > of specific server behaviors can be used on a case-by-case basis: > > 1. If anyone feels very strongly that in a given service environment some > particular 5xx code really is safely retriable without reconciliation, we > could consider extracting a runtime config that lists the non-retriable > HTTP codes instead of only having the hard-coded switch statement coming at > the cost of some code complexity > 2. We could make it easier to inject a > dynamically-classloaded HttpRequestRetryStrategy in the HttpClient ( > https://github.com/apache/iceberg/blob/a8fdb23682a7c083ea6ff9873f1531dd9d465aa7/core/src/main/java/org/apache/iceberg/rest/HTTPClient.java#L127) > - it seems there's already some precedent for injectable helpers here such > as rest.client.tls.configurer-impl -- custom impls could either be as > simple as using a different set of 5xx non-retriable codes, or as complex > as cooperating with other customized client-side logic to use an end-to-end > idempotency request-uuid to distinguish safe retries > 3. We could consider letting other response headers such as "Retry-After" ( > https://developer.mozilla.org/en-US/docs/Web/HTTP/Reference/Headers/Retry-After) > be a heuristic to interpret the *intent* of the server to communicate a > "graceful unavailability" that is safe to retry, vs an unexpected failure > where state is unknown > > In any case though, it does seem prudent to start with the safe default. > > On Wed, Jul 23, 2025 at 1:22 PM Prashant Singh <prashant010...@gmail.com> > wrote: > >> Hey All, >> Presently we treat 503 status code in spec here <http:///> as if >> the request was not tried at all. While doing 1.9.2 it was brought up that >> this exception might not be true and some of the very common services like >> Istio and Envoy were brought up as examples. >> >> As a result I wanted to formally bring up disabling retries on 503 on >> updateTable and treating that as CommitStateUnknown similar to what we do >> for 500, 502, 504. The rationale for this being 503 doesn't necessarily >> mean the request was not tried at all and systems like ENVOY can return 503 >> in the middle of processing. >> >> Adding detailed analysis of my colleague Dennis here : >> https://lists.apache.org/thread/h8xowcntomh7r5woz24cv27v3o8r9wgd >> >> It seemed like most people participating in the thread above agreed, >> starting this thread to see if there are objections to it in a wider forum. >> >> Meanwhile, I have filed a PR for the same >> [1] https://github.com/apache/iceberg/pull/13619 >> >> Please let us know your thoughts and provide your feedback ! >> >> Best, >> Prashant Singh >> >> >> >>