I'm also leaning toward 503 even though it's not perfect. Thanks for all
the discussions.

Yufei


On Thu, Jun 11, 2026 at 1:38 PM Dmitri Bourlatchkov <[email protected]>
wrote:

> Hi All,
>
> Let's go with a simple 503 first.
>
> I suppose this failure mode is not super prevalent to justify implementing
> retries right away. If users complain we can add server-side retries later.
>
> Cheers,
> Dmitri.
>
> On Thu, Jun 11, 2026 at 4:03 PM Nándor Kollár <[email protected]> wrote:
>
> > Hi All,
> >
> > I don’t have a strong preference either; in fact, both 429 and 503 are
> > suboptimal choices. After reading the RFCs, I’ve updated my
> > preference: 503 might be the better option, since 429 indicates that
> > the client is sending too many requests within a given timeframe. In
> > our case, this seems more like a server-side issue. Should we conclude
> > that 503 + retry is the least bad approach here? Also, should the
> > retry be added in a separate PR, or would it be better to implement
> > both in a single one?
> >
> > Cheers,
> > Nandor
> >
> > Dmitri Bourlatchkov <[email protected]> ezt írta (időpont: 2026. jún.
> > 11., Cs, 16:44):
> > >
> > > Hi Alex,
> > >
> > > My personal interpretation is that 429 is appropriate for this case
> > _with_
> > > retries. Reading the RFC [1], I think it is not too much of a stretch,
> > > because clearly TARGET_ENTITY_CONCURRENTLY_MODIFIED can occur only if
> > > clients are submitting (a lot of) concurrent updates to the same table
> > and
> > > the server is unable to handle the rename because of that (assuming it
> > > retries with a timeout). The server is within its purview to request
> > > clients to slow down. The only skew is that we cannot send 429 to all
> > > involved clients equally in this case.
> > >
> > > As for 503, after re-reading its RFC [2], I think it matches our
> > situation
> > > too. If we're not introducing retries right now, I agree that 503 is
> > > probably a better option.
> > >
> > > [1] https://www.rfc-editor.org/info/rfc6585/#section-4
> > >
> > > [2] https://www.rfc-editor.org/info/rfc9110/#status.503
> > >
> > > Cheers,
> > > Dmitri.
> > >
> > > On Thu, Jun 11, 2026 at 8:30 AM Alexandre Dutra <[email protected]>
> > wrote:
> > >
> > > > The server-side retry idea is interesting, but it won't eliminate the
> > > > problem completely, will it?
> > > >
> > > > If not, I'd suggest pursuing the server-side retry idea as a separate
> > > > effort.
> > > >
> > > > We still need to settle on what status code to return for
> > > > TARGET_ENTITY_CONCURRENTLY_MODIFIED:
> > > >
> > > > - 429 (without the Retry-After header)
> > > > - 503
> > > >
> > > > I still think 503 is slightly preferable, but won't fight for it
> > either.
> > > >
> > > > Thanks,
> > > > Alex
> > > >
> > > > On Thu, Jun 11, 2026 at 10:24 AM Nándor Kollár <[email protected]>
> > wrote:
> > > > >
> > > > > Thanks, Dmitri, for the explanation. It now makes sense to me to
> > > > > handle the retries on the server side. If there's still a conflict
> > > > > after a couple of retry attempts, then a 429 response code seems
> > > > > reasonable to me.
> > > > >
> > > > > Thanks,
> > > > > Nandor
> > > > >
> > > > > Dmitri Bourlatchkov <[email protected]> ezt írta (időpont: 2026.
> jún.
> > > > > 11., Cs, 0:06):
> > > > > >
> > > > > > Hi Nandor,
> > > > > >
> > > > > > Rename is fundamentally different from other table operations in
> > that
> > > > it
> > > > > > only affect a catalog-owned piece of data, which is the name.
> > > > > >
> > > > > > Table metadata or properties are not affected.
> > > > > >
> > > > > > I imagine the likely conflict is between a rename and a metadata
> > > > update,
> > > > > > which is conceptually not a client-side conflict. The server
> > should be
> > > > able
> > > > > > to handle it locally.
> > > > > >
> > > > > > However, metadata update conflicts have to bounce to the client
> > > > > > because Polaris cannot resolve them in most cases. The only
> > exception
> > > > AFAIK
> > > > > > is the compact/update conflict [1285].
> > > > > >
> > > > > > A rename/rename conflict will bounce to the client as a 404 on
> the
> > > > first
> > > > > > retry.
> > > > > >
> > > > > > [1285] https://github.com/apache/polaris/pull/1285
> > > > > >
> > > > > > Cheers,
> > > > > > Dmitri.
> > > > > >
> > > > > > On Wed, Jun 10, 2026 at 3:47 PM Nándor Kollár <
> [email protected]>
> > > > wrote:
> > > > > >
> > > > > > > Hi All,
> > > > > > >
> > > > > > > I'm not against a server-side retry, but I think in that case
> we
> > > > > > > should do the same for other table update operations no? That
> > sounds
> > > > > > > like a more consistent approach.
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Nandor
> > > > > > >
> > > > > > > Dmitri Bourlatchkov <[email protected]> ezt írta (időpont:
> 2026.
> > jún.
> > > > > > > 10., Sze, 15:41):
> > > > > > > >
> > > > > > > > Hi All,
> > > > > > > >
> > > > > > > > How about we make Polaris retry the rename a few times on the
> > > > server
> > > > > > > side?
> > > > > > > > If it gets TARGET_ENTITY_CONCURRENTLY_MODIFIED all the times,
> > we
> > > > > > > eventually
> > > > > > > > fail with 429.
> > > > > > > >
> > > > > > > > Prolonged optimistic lock failures probably mean that there
> > are,
> > > > indeed,
> > > > > > > > too many requests. Ideally we should respond with 429 on all
> > > > requests
> > > > > > > > clashing on the entity in question (not just the rename),
> but I
> > > > guess it
> > > > > > > is
> > > > > > > > not technically feasible ATM.
> > > > > > > >
> > > > > > > > WDYT?
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > > Dmitri.
> > > > > > > >
> > > > > > > > On Wed, Jun 10, 2026 at 9:10 AM Alexandre Dutra <
> > [email protected]
> > > > >
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Hi all,
> > > > > > > > >
> > > > > > > > > I also reviewed the PR and left some comments. Just to
> > summarize
> > > > my
> > > > > > > > > thoughts:
> > > > > > > > >
> > > > > > > > > - ENTITY_CANNOT_BE_RESOLVED and
> > CATALOG_PATH_CANNOT_BE_RESOLVED
> > > > should
> > > > > > > > > be mapped to 404 -> NoSuchNamespaceException. The comments
> > for
> > > > them in
> > > > > > > > > BaseResult are imho inaccurate (they are non-retriable).
> > > > > > > > >
> > > > > > > > > - TARGET_ENTITY_CONCURRENTLY_MODIFIED: unfortunately 409 is
> > > > precluded
> > > > > > > > > because of the Iceberg spec. I am not a big fan of 429
> > because it
> > > > > > > > > could force clients to throttle. So, I think the current
> > > > proposal of
> > > > > > > > > 503 -> ServiceUnavailableException is the least worst
> choice
> > (as
> > > > it's
> > > > > > > > > retriable).
> > > > > > > > >
> > > > > > > > > Thanks,
> > > > > > > > > Alex
> > > > > > > > >
> > > > > > > > > On Wed, Jun 10, 2026 at 11:18 AM Nándor Kollár <
> > > > > > > [email protected]>
> > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > Hi All,
> > > > > > > > > >
> > > > > > > > > > The Iceberg REST spec doesn't appear to define 429 as a
> > valid
> > > > > > > response
> > > > > > > > > > status for rename operations, and I don't think it's an
> > ideal
> > > > choice
> > > > > > > > > > either, since it typically indicates rate-limiting issues
> > > > rather than
> > > > > > > > > > conflicting updates.
> > > > > > > > > >
> > > > > > > > > > In my opinion, 409 would be the most appropriate status
> > code,
> > > > but the
> > > > > > > > > > REST spec reserves it for a different purpose. Perhaps
> 428
> > > > > > > > > > Precondition Required could be used to signal a conflict,
> > but
> > > > that
> > > > > > > > > > status is generally intended for GET-then-PUT concurrency
> > > > scenarios,
> > > > > > > > > > which doesn't seem to match this case.
> > > > > > > > > >
> > > > > > > > > > I think we'll be diverging from the Iceberg spec either
> > way,
> > > > since it
> > > > > > > > > > doesn't define a response code for conflicting rename
> > > > operations.
> > > > > > > > > > Given that, it's probably better to use a status code
> that
> > > > isn't
> > > > > > > > > > defined by the spec at all (such as 429) than to reuse
> one
> > > > that the
> > > > > > > > > > spec already assigns a different meaning to. Considering
> > this,
> > > > I vote
> > > > > > > > > > for 429 as the least worst option.
> > > > > > > > > >
> > > > > > > > > > As of ENTITY_CANNOT_BE_RESOLVED, it sounds like a 404 for
> > me
> > > > too.
> > > > > > > > > > However, the comment suggests that it may be used for
> > conflict
> > > > > > > > > > scenarios as well, and client should retry:
> > > > > > > > > >
> > > > > > > > > > // the specified entity (and its path) cannot be
> resolved.
> > > > There is a
> > > > > > > > > > possibility that by the
> > > > > > > > > > // time a call is made by the client to the persistent
> > storage,
> > > > > > > > > > something has changed due to
> > > > > > > > > > // concurrent modification(s). The client should retry in
> > that
> > > > case.
> > > > > > > > > > ENTITY_CANNOT_BE_RESOLVED(4),
> > > > > > > > > >
> > > > > > > > > > Thanks,
> > > > > > > > > > Nandor
> > > > > > > > > >
> > > > > > > > > > Dmitri Bourlatchkov <[email protected]> ezt írta
> (időpont:
> > > > 2026. jún.
> > > > > > > > > > 10., Sze, 0:43):
> > > > > > > > > > >
> > > > > > > > > > > Hi All,
> > > > > > > > > > >
> > > > > > > > > > > I reviewed PR [4646] (but did not leave any comments in
> > GH,
> > > > > > > replying
> > > > > > > > > here)
> > > > > > > > > > > and the current 500 error is most certainly not correct
> > for
> > > > this
> > > > > > > > > failure
> > > > > > > > > > > mode. 503 is not ideal either, as I commented earlier.
> > > > > > > > > > >
> > > > > > > > > > > From the PR I gather that people are generally
> > uncomfortable
> > > > > > > returning
> > > > > > > > > a
> > > > > > > > > > > 409 response because it has a narrow meaning in the
> > Iceberg
> > > > REST
> > > > > > > API
> > > > > > > > > spec.
> > > > > > > > > > > It is a fair point.
> > > > > > > > > > >
> > > > > > > > > > > Re: the TARGET_ENTITY_CONCURRENTLY_MODIFIED case. How
> > about
> > > > 429
> > > > > > > (Too
> > > > > > > > > Many
> > > > > > > > > > > Requests)?
> > > > > > > > > > >
> > > > > > > > > > > 429 is clearly retryable and does not carry any
> > implications
> > > > about
> > > > > > > the
> > > > > > > > > > > state of the system after handling the request.
> > > > > > > > > > >
> > > > > > > > > > > The message could say "Unable to rename entity due to
> > > > overlapping
> > > > > > > > > > > concurrent modifications". We do not have to set the
> > > > Retry-After
> > > > > > > > > header.
> > > > > > > > > > >
> > > > > > > > > > > Re: ENTITY_CANNOT_BE_RESOLVED. I believe this is a
> solid
> > 404
> > > > case.
> > > > > > > > > > >
> > > > > > > > > > > WDYT?
> > > > > > > > > > >
> > > > > > > > > > > [4646] https://github.com/apache/polaris/pull/4646
> > > > > > > > > > >
> > > > > > > > > > > Cheers,
> > > > > > > > > > > Dmitri.
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > On Tue, Jun 9, 2026 at 12:02 AM Dmitri Bourlatchkov <
> > > > > > > [email protected]>
> > > > > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > Hi Nandor,
> > > > > > > > > > > >
> > > > > > > > > > > > Good question :)
> > > > > > > > > > > >
> > > > > > > > > > > > I did not read the PR yet, but my gut feel is towards
> > the
> > > > 409
> > > > > > > error
> > > > > > > > > code
> > > > > > > > > > > > because 5xx generally means a fundamental issue with
> > the
> > > > service
> > > > > > > that
> > > > > > > > > > > > goes beyond the scope of client requests.
> > > > > > > > > > > >
> > > > > > > > > > > > In a more general perspective, traditional HTTP
> status
> > > > codes are
> > > > > > > > > often too
> > > > > > > > > > > > narrow to express all the API minute error details.
> My
> > > > personal
> > > > > > > view
> > > > > > > > > is
> > > > > > > > > > > > that a rich payload object in the response can be
> > useful
> > > > in such
> > > > > > > > > cases...
> > > > > > > > > > > > but again that will require a spec change.
> > > > > > > > > > > >
> > > > > > > > > > > > That said, if the request does not require additional
> > > > client
> > > > > > > input
> > > > > > > > > for a
> > > > > > > > > > > > retry, Polaris should retry. I assume we can refactor
> > the
> > > > code to
> > > > > > > > > clearly
> > > > > > > > > > > > distinguish retryable and non-retryable failures on
> the
> > > > server
> > > > > > > side.
> > > > > > > > > That
> > > > > > > > > > > > part should not require spec changes.
> > > > > > > > > > > >
> > > > > > > > > > > > Cheers,
> > > > > > > > > > > > Dmitri.
> > > > > > > > > > > >
> > > > > > > > > > > > On Mon, Jun 8, 2026 at 9:48 AM Nándor Kollár <
> > > > > > > > > [email protected]>
> > > > > > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > >> Hi all,
> > > > > > > > > > > >>
> > > > > > > > > > > >> I'd like to ask for the community's opinion on the
> > > > appropriate
> > > > > > > > > > > >> response status code for table/view rename
> operations
> > when
> > > > > > > there is
> > > > > > > > > a
> > > > > > > > > > > >> conflicting operation in progress.
> > > > > > > > > > > >>
> > > > > > > > > > > >> A PR was recently raised [1], which I believe
> > highlighted
> > > > the
> > > > > > > > > question
> > > > > > > > > > > >> of what the correct status code should be in such
> > conflict
> > > > > > > > > scenarios.
> > > > > > > > > > > >> To me, the Iceberg REST Catalog specification does
> not
> > > > clearly
> > > > > > > > > address
> > > > > > > > > > > >> this case. Neither 409 Conflict nor 503 Service
> > > > Unavailable
> > > > > > > seems
> > > > > > > > > > > >> entirely appropriate for indicating to the client
> > that the
> > > > > > > operation
> > > > > > > > > > > >> could not be completed due to a conflict and that
> > > > retrying the
> > > > > > > > > > > >> operation may succeed.
> > > > > > > > > > > >>
> > > > > > > > > > > >> I think 409 Conflict might be the better choice, but
> > that
> > > > would
> > > > > > > > > > > >> require a change to the specification. It would also
> > end
> > > > up
> > > > > > > serving
> > > > > > > > > > > >> two different purposes: a non-retriable scenario,
> > where
> > > > the
> > > > > > > target
> > > > > > > > > > > >> name is already reserved, and a retriable scenario,
> > where
> > > > the
> > > > > > > > > > > >> operation failed due to a temporary conflict. What
> do
> > you
> > > > think?
> > > > > > > > > > > >>
> > > > > > > > > > > >> [1] https://github.com/apache/polaris/pull/4646
> > > > > > > > > > > >>
> > > > > > > > > > > >> Thanks,
> > > > > > > > > > > >> Nandor
> > > > > > > > > > > >>
> > > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > --
> > > > > > > > > > Kollár Nándor
> > > > > > > > >
> > > > > > >
> > > >
> >
>

Reply via email to