Re: [DISCUSS] Mark 503 error code as non-retryable for updateTable

Steve Loughran Tue, 05 Aug 2025 11:00:47 -0700

 503 is  throttling on many (but not all) of the AWS services (s3, lambda,
redshift..) - though not dynamodb or others that return 400 + some message.


If aws were to deploy an endpoint, they'd have to decide what to do. or
even better, the spec could say "there is a specific error code for
throttling, "429: Too Many Requests" to be returned in such a situation.

On Sat, 2 Aug 2025 at 03:32, Dennis Huo <huoi...@gmail.com> wrote:

> PR LGTM, thanks for following up on this!
>
> Reiterating my comments in the PR, IMO especially now that Russell's
> https://github.com/apache/iceberg/pull/13449 is merged so that all the
> read-only methods can still perform low-level retries, this change makes
> sense as a safer default for commit behaviors.
>
> I could imagine a few potential follow-up features if people want to
> achieve better availability in situations where more out-of-band knowledge
> of specific server behaviors can be used on a case-by-case basis:
>
> 1. If anyone feels very strongly that in a given service environment some
> particular 5xx code really is safely retriable without reconciliation, we
> could consider extracting a runtime config that lists the non-retriable
> HTTP codes instead of only having the hard-coded switch statement coming at
> the cost of some code complexity
> 2. We could make it easier to inject a
> dynamically-classloaded HttpRequestRetryStrategy in the HttpClient (
> https://github.com/apache/iceberg/blob/a8fdb23682a7c083ea6ff9873f1531dd9d465aa7/core/src/main/java/org/apache/iceberg/rest/HTTPClient.java#L127)
> - it seems there's already some precedent for injectable helpers here such
> as rest.client.tls.configurer-impl -- custom impls could either be as
> simple as using a different set of 5xx non-retriable codes, or as complex
> as cooperating with other customized client-side logic to use an end-to-end
> idempotency request-uuid to distinguish safe retries
> 3. We could consider letting other response headers such as "Retry-After" (
> https://developer.mozilla.org/en-US/docs/Web/HTTP/Reference/Headers/Retry-After)
> be a heuristic to interpret the *intent* of the server to communicate a
> "graceful unavailability" that is safe to retry, vs an unexpected failure
> where state is unknown
>
> In any case though, it does seem prudent to start with the safe default.
>
> On Wed, Jul 23, 2025 at 1:22 PM Prashant Singh <prashant010...@gmail.com>
> wrote:
>
>> Hey All,
>> Presently we treat 503 status code in spec here <http:///> as if
>> the request was not tried at all. While doing 1.9.2 it was brought up that
>> this exception might not be true and some of the very common services like
>> Istio and Envoy were brought up as examples.
>>
>> As a result I wanted to formally bring up disabling retries on 503 on
>> updateTable and treating that as CommitStateUnknown similar to what we do
>> for 500, 502, 504. The rationale for this being 503 doesn't necessarily
>> mean the request was not tried at all and systems like ENVOY can return 503
>> in the middle of processing.
>>
>> Adding detailed analysis of my colleague Dennis here :
>> https://lists.apache.org/thread/h8xowcntomh7r5woz24cv27v3o8r9wgd
>>
>> It seemed like most people participating in the thread above agreed,
>> starting this thread to see if there are objections to it in a wider forum.
>>
>> Meanwhile, I have filed a PR for the same
>> [1] https://github.com/apache/iceberg/pull/13619
>>
>> Please let us know your thoughts and provide your feedback !
>>
>> Best,
>> Prashant Singh
>>
>>
>>
>>

Re: [DISCUSS] Mark 503 error code as non-retryable for updateTable

Reply via email to