I'm also leaning toward 503 even though it's not perfect. Thanks for all the discussions.
Yufei On Thu, Jun 11, 2026 at 1:38 PM Dmitri Bourlatchkov <[email protected]> wrote: > Hi All, > > Let's go with a simple 503 first. > > I suppose this failure mode is not super prevalent to justify implementing > retries right away. If users complain we can add server-side retries later. > > Cheers, > Dmitri. > > On Thu, Jun 11, 2026 at 4:03 PM Nándor Kollár <[email protected]> wrote: > > > Hi All, > > > > I don’t have a strong preference either; in fact, both 429 and 503 are > > suboptimal choices. After reading the RFCs, I’ve updated my > > preference: 503 might be the better option, since 429 indicates that > > the client is sending too many requests within a given timeframe. In > > our case, this seems more like a server-side issue. Should we conclude > > that 503 + retry is the least bad approach here? Also, should the > > retry be added in a separate PR, or would it be better to implement > > both in a single one? > > > > Cheers, > > Nandor > > > > Dmitri Bourlatchkov <[email protected]> ezt írta (időpont: 2026. jún. > > 11., Cs, 16:44): > > > > > > Hi Alex, > > > > > > My personal interpretation is that 429 is appropriate for this case > > _with_ > > > retries. Reading the RFC [1], I think it is not too much of a stretch, > > > because clearly TARGET_ENTITY_CONCURRENTLY_MODIFIED can occur only if > > > clients are submitting (a lot of) concurrent updates to the same table > > and > > > the server is unable to handle the rename because of that (assuming it > > > retries with a timeout). The server is within its purview to request > > > clients to slow down. The only skew is that we cannot send 429 to all > > > involved clients equally in this case. > > > > > > As for 503, after re-reading its RFC [2], I think it matches our > > situation > > > too. If we're not introducing retries right now, I agree that 503 is > > > probably a better option. > > > > > > [1] https://www.rfc-editor.org/info/rfc6585/#section-4 > > > > > > [2] https://www.rfc-editor.org/info/rfc9110/#status.503 > > > > > > Cheers, > > > Dmitri. > > > > > > On Thu, Jun 11, 2026 at 8:30 AM Alexandre Dutra <[email protected]> > > wrote: > > > > > > > The server-side retry idea is interesting, but it won't eliminate the > > > > problem completely, will it? > > > > > > > > If not, I'd suggest pursuing the server-side retry idea as a separate > > > > effort. > > > > > > > > We still need to settle on what status code to return for > > > > TARGET_ENTITY_CONCURRENTLY_MODIFIED: > > > > > > > > - 429 (without the Retry-After header) > > > > - 503 > > > > > > > > I still think 503 is slightly preferable, but won't fight for it > > either. > > > > > > > > Thanks, > > > > Alex > > > > > > > > On Thu, Jun 11, 2026 at 10:24 AM Nándor Kollár <[email protected]> > > wrote: > > > > > > > > > > Thanks, Dmitri, for the explanation. It now makes sense to me to > > > > > handle the retries on the server side. If there's still a conflict > > > > > after a couple of retry attempts, then a 429 response code seems > > > > > reasonable to me. > > > > > > > > > > Thanks, > > > > > Nandor > > > > > > > > > > Dmitri Bourlatchkov <[email protected]> ezt írta (időpont: 2026. > jún. > > > > > 11., Cs, 0:06): > > > > > > > > > > > > Hi Nandor, > > > > > > > > > > > > Rename is fundamentally different from other table operations in > > that > > > > it > > > > > > only affect a catalog-owned piece of data, which is the name. > > > > > > > > > > > > Table metadata or properties are not affected. > > > > > > > > > > > > I imagine the likely conflict is between a rename and a metadata > > > > update, > > > > > > which is conceptually not a client-side conflict. The server > > should be > > > > able > > > > > > to handle it locally. > > > > > > > > > > > > However, metadata update conflicts have to bounce to the client > > > > > > because Polaris cannot resolve them in most cases. The only > > exception > > > > AFAIK > > > > > > is the compact/update conflict [1285]. > > > > > > > > > > > > A rename/rename conflict will bounce to the client as a 404 on > the > > > > first > > > > > > retry. > > > > > > > > > > > > [1285] https://github.com/apache/polaris/pull/1285 > > > > > > > > > > > > Cheers, > > > > > > Dmitri. > > > > > > > > > > > > On Wed, Jun 10, 2026 at 3:47 PM Nándor Kollár < > [email protected]> > > > > wrote: > > > > > > > > > > > > > Hi All, > > > > > > > > > > > > > > I'm not against a server-side retry, but I think in that case > we > > > > > > > should do the same for other table update operations no? That > > sounds > > > > > > > like a more consistent approach. > > > > > > > > > > > > > > Thanks, > > > > > > > Nandor > > > > > > > > > > > > > > Dmitri Bourlatchkov <[email protected]> ezt írta (időpont: > 2026. > > jún. > > > > > > > 10., Sze, 15:41): > > > > > > > > > > > > > > > > Hi All, > > > > > > > > > > > > > > > > How about we make Polaris retry the rename a few times on the > > > > server > > > > > > > side? > > > > > > > > If it gets TARGET_ENTITY_CONCURRENTLY_MODIFIED all the times, > > we > > > > > > > eventually > > > > > > > > fail with 429. > > > > > > > > > > > > > > > > Prolonged optimistic lock failures probably mean that there > > are, > > > > indeed, > > > > > > > > too many requests. Ideally we should respond with 429 on all > > > > requests > > > > > > > > clashing on the entity in question (not just the rename), > but I > > > > guess it > > > > > > > is > > > > > > > > not technically feasible ATM. > > > > > > > > > > > > > > > > WDYT? > > > > > > > > > > > > > > > > Thanks, > > > > > > > > Dmitri. > > > > > > > > > > > > > > > > On Wed, Jun 10, 2026 at 9:10 AM Alexandre Dutra < > > [email protected] > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > Hi all, > > > > > > > > > > > > > > > > > > I also reviewed the PR and left some comments. Just to > > summarize > > > > my > > > > > > > > > thoughts: > > > > > > > > > > > > > > > > > > - ENTITY_CANNOT_BE_RESOLVED and > > CATALOG_PATH_CANNOT_BE_RESOLVED > > > > should > > > > > > > > > be mapped to 404 -> NoSuchNamespaceException. The comments > > for > > > > them in > > > > > > > > > BaseResult are imho inaccurate (they are non-retriable). > > > > > > > > > > > > > > > > > > - TARGET_ENTITY_CONCURRENTLY_MODIFIED: unfortunately 409 is > > > > precluded > > > > > > > > > because of the Iceberg spec. I am not a big fan of 429 > > because it > > > > > > > > > could force clients to throttle. So, I think the current > > > > proposal of > > > > > > > > > 503 -> ServiceUnavailableException is the least worst > choice > > (as > > > > it's > > > > > > > > > retriable). > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > Alex > > > > > > > > > > > > > > > > > > On Wed, Jun 10, 2026 at 11:18 AM Nándor Kollár < > > > > > > > [email protected]> > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > Hi All, > > > > > > > > > > > > > > > > > > > > The Iceberg REST spec doesn't appear to define 429 as a > > valid > > > > > > > response > > > > > > > > > > status for rename operations, and I don't think it's an > > ideal > > > > choice > > > > > > > > > > either, since it typically indicates rate-limiting issues > > > > rather than > > > > > > > > > > conflicting updates. > > > > > > > > > > > > > > > > > > > > In my opinion, 409 would be the most appropriate status > > code, > > > > but the > > > > > > > > > > REST spec reserves it for a different purpose. Perhaps > 428 > > > > > > > > > > Precondition Required could be used to signal a conflict, > > but > > > > that > > > > > > > > > > status is generally intended for GET-then-PUT concurrency > > > > scenarios, > > > > > > > > > > which doesn't seem to match this case. > > > > > > > > > > > > > > > > > > > > I think we'll be diverging from the Iceberg spec either > > way, > > > > since it > > > > > > > > > > doesn't define a response code for conflicting rename > > > > operations. > > > > > > > > > > Given that, it's probably better to use a status code > that > > > > isn't > > > > > > > > > > defined by the spec at all (such as 429) than to reuse > one > > > > that the > > > > > > > > > > spec already assigns a different meaning to. Considering > > this, > > > > I vote > > > > > > > > > > for 429 as the least worst option. > > > > > > > > > > > > > > > > > > > > As of ENTITY_CANNOT_BE_RESOLVED, it sounds like a 404 for > > me > > > > too. > > > > > > > > > > However, the comment suggests that it may be used for > > conflict > > > > > > > > > > scenarios as well, and client should retry: > > > > > > > > > > > > > > > > > > > > // the specified entity (and its path) cannot be > resolved. > > > > There is a > > > > > > > > > > possibility that by the > > > > > > > > > > // time a call is made by the client to the persistent > > storage, > > > > > > > > > > something has changed due to > > > > > > > > > > // concurrent modification(s). The client should retry in > > that > > > > case. > > > > > > > > > > ENTITY_CANNOT_BE_RESOLVED(4), > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > Nandor > > > > > > > > > > > > > > > > > > > > Dmitri Bourlatchkov <[email protected]> ezt írta > (időpont: > > > > 2026. jún. > > > > > > > > > > 10., Sze, 0:43): > > > > > > > > > > > > > > > > > > > > > > Hi All, > > > > > > > > > > > > > > > > > > > > > > I reviewed PR [4646] (but did not leave any comments in > > GH, > > > > > > > replying > > > > > > > > > here) > > > > > > > > > > > and the current 500 error is most certainly not correct > > for > > > > this > > > > > > > > > failure > > > > > > > > > > > mode. 503 is not ideal either, as I commented earlier. > > > > > > > > > > > > > > > > > > > > > > From the PR I gather that people are generally > > uncomfortable > > > > > > > returning > > > > > > > > > a > > > > > > > > > > > 409 response because it has a narrow meaning in the > > Iceberg > > > > REST > > > > > > > API > > > > > > > > > spec. > > > > > > > > > > > It is a fair point. > > > > > > > > > > > > > > > > > > > > > > Re: the TARGET_ENTITY_CONCURRENTLY_MODIFIED case. How > > about > > > > 429 > > > > > > > (Too > > > > > > > > > Many > > > > > > > > > > > Requests)? > > > > > > > > > > > > > > > > > > > > > > 429 is clearly retryable and does not carry any > > implications > > > > about > > > > > > > the > > > > > > > > > > > state of the system after handling the request. > > > > > > > > > > > > > > > > > > > > > > The message could say "Unable to rename entity due to > > > > overlapping > > > > > > > > > > > concurrent modifications". We do not have to set the > > > > Retry-After > > > > > > > > > header. > > > > > > > > > > > > > > > > > > > > > > Re: ENTITY_CANNOT_BE_RESOLVED. I believe this is a > solid > > 404 > > > > case. > > > > > > > > > > > > > > > > > > > > > > WDYT? > > > > > > > > > > > > > > > > > > > > > > [4646] https://github.com/apache/polaris/pull/4646 > > > > > > > > > > > > > > > > > > > > > > Cheers, > > > > > > > > > > > Dmitri. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Tue, Jun 9, 2026 at 12:02 AM Dmitri Bourlatchkov < > > > > > > > [email protected]> > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > Hi Nandor, > > > > > > > > > > > > > > > > > > > > > > > > Good question :) > > > > > > > > > > > > > > > > > > > > > > > > I did not read the PR yet, but my gut feel is towards > > the > > > > 409 > > > > > > > error > > > > > > > > > code > > > > > > > > > > > > because 5xx generally means a fundamental issue with > > the > > > > service > > > > > > > that > > > > > > > > > > > > goes beyond the scope of client requests. > > > > > > > > > > > > > > > > > > > > > > > > In a more general perspective, traditional HTTP > status > > > > codes are > > > > > > > > > often too > > > > > > > > > > > > narrow to express all the API minute error details. > My > > > > personal > > > > > > > view > > > > > > > > > is > > > > > > > > > > > > that a rich payload object in the response can be > > useful > > > > in such > > > > > > > > > cases... > > > > > > > > > > > > but again that will require a spec change. > > > > > > > > > > > > > > > > > > > > > > > > That said, if the request does not require additional > > > > client > > > > > > > input > > > > > > > > > for a > > > > > > > > > > > > retry, Polaris should retry. I assume we can refactor > > the > > > > code to > > > > > > > > > clearly > > > > > > > > > > > > distinguish retryable and non-retryable failures on > the > > > > server > > > > > > > side. > > > > > > > > > That > > > > > > > > > > > > part should not require spec changes. > > > > > > > > > > > > > > > > > > > > > > > > Cheers, > > > > > > > > > > > > Dmitri. > > > > > > > > > > > > > > > > > > > > > > > > On Mon, Jun 8, 2026 at 9:48 AM Nándor Kollár < > > > > > > > > > [email protected]> > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > >> Hi all, > > > > > > > > > > > >> > > > > > > > > > > > >> I'd like to ask for the community's opinion on the > > > > appropriate > > > > > > > > > > > >> response status code for table/view rename > operations > > when > > > > > > > there is > > > > > > > > > a > > > > > > > > > > > >> conflicting operation in progress. > > > > > > > > > > > >> > > > > > > > > > > > >> A PR was recently raised [1], which I believe > > highlighted > > > > the > > > > > > > > > question > > > > > > > > > > > >> of what the correct status code should be in such > > conflict > > > > > > > > > scenarios. > > > > > > > > > > > >> To me, the Iceberg REST Catalog specification does > not > > > > clearly > > > > > > > > > address > > > > > > > > > > > >> this case. Neither 409 Conflict nor 503 Service > > > > Unavailable > > > > > > > seems > > > > > > > > > > > >> entirely appropriate for indicating to the client > > that the > > > > > > > operation > > > > > > > > > > > >> could not be completed due to a conflict and that > > > > retrying the > > > > > > > > > > > >> operation may succeed. > > > > > > > > > > > >> > > > > > > > > > > > >> I think 409 Conflict might be the better choice, but > > that > > > > would > > > > > > > > > > > >> require a change to the specification. It would also > > end > > > > up > > > > > > > serving > > > > > > > > > > > >> two different purposes: a non-retriable scenario, > > where > > > > the > > > > > > > target > > > > > > > > > > > >> name is already reserved, and a retriable scenario, > > where > > > > the > > > > > > > > > > > >> operation failed due to a temporary conflict. What > do > > you > > > > think? > > > > > > > > > > > >> > > > > > > > > > > > >> [1] https://github.com/apache/polaris/pull/4646 > > > > > > > > > > > >> > > > > > > > > > > > >> Thanks, > > > > > > > > > > > >> Nandor > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > > Kollár Nándor > > > > > > > > > > > > > > > > > > > > > > >
