FANNG1 commented on PR #10675: URL: https://github.com/apache/gravitino/pull/10675#issuecomment-4205268980
@jerryshao @royi My biggest concern is still the `atomic` semantics. The current two-phase `validate-then-commit` approach is definitely better than directly committing table by table, because it avoids cases where some requirements already fail before any commit starts. But once it enters the commit phase, it is still only `best-effort`, not a truly atomic multi-table commit. If one table commit succeeds and a later one fails, we can still end up with a partial-success state. So from my perspective, real atomicity has to come from the underlying catalog/metastore capability, not just from the REST layer. For Gravitino-owned catalogs, a stronger solution would likely require something like staging multi-table metadata changes first, then switching metadata pointers together with a final atomic/CAS-style step. Polaris seems to go a bit further in this direction, but it is still important to be precise about the semantics. Its `commitTransaction` is not a full distributed 2PC either. As I understand it, Polaris: - checks authorization for each table in the request - groups and validates updates per table - stages metastore entity changes in a transaction workspace - only at the end performs a batch CAS to switch multiple metadata pointers together That gives stronger semantics for Polaris-managed metadata pointers, but it is still not fully global atomic commit, since metadata files may already have been written before the final CAS, and failures can still leave unreferenced metadata behind. Would like to get your thoughts: should we first merge a `best-effort` implementation like this, with the semantics clearly documented, or should we wait until we have stronger catalog-level support before exposing this endpoint? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
