Thanks for the updated proposal Drew!
My preference for using the catalog authored timestamp is to minimize
changes to the REST spec so we can have good backwards compatibility. I
have quickly put together a draft proposal on how this should work. Looking
forward to feedback and discussion.

 Draft Proposal: Catalog‑Authored Timestamps for Apache Iceberg REST Catalog
<https://drive.google.com/open?id=1KVgUJc1WgftHfLz118vMbEE7HV8_pUDk4s-GJFDyAOE>

Thanks,
Maninder

On Wed, May 14, 2025 at 6:12 PM Drew <img...@gmail.com> wrote:

> Hi everyone,
>
> Thank you for feedback on the MTT proposal and during community sync.
> Based on it, Jagdeep and I have iterated on the document and added a second
> option to use *Catalog CommitSequenceNumbers*. Looking forward to getting
> more feedback on the proposal, where to add more details or
> approach/changes to consider. We appreciate everyone's time on this!
>
> The option introduces *Catalog CommitSequenceNumbers(CSNs)*, which allow
> clients/engines to read a consistent view of multiple tables without
> needing to register a transaction context with the catalog. This removes
> the need of registering a transaction context with Catalog, thus removing
> the need of transaction bookkeeping on the catalog side. For aborting
> transactions early, clients can use LoadTable with and without CSN to
> figure out if there is already a conflicting write on any of the tables
> being modified. Also removed the section where transactions were staging
> commits on Catalog, and changed the proposal to align with Eduard's PR
> around staging changes locally before commit (
> https://github.com/apache/iceberg/pull/6948).
>
> Jagdeep also clarified in an example in a previous email where a workload
> may require multi table snapshot isolation, even if the tables are being
> updated without Multi-Table commit API. Though most MTT transactions will
> commit using the multi table commit API.
>
> Maninder, for the approach of "common notion of time between clients and
> catalog" - I spent some time thinking about it, but cannot find a feasible
> way to do this. Yes, the catalogs can use a high precision clock, but
> clients cannot use Catalog Timestamp from API calls to set local clock due
> to network latency for request/response. For example, different requests to
> the same Catalog servers can return different timestamps based on network
> latency. Also what if a client works with more than 1 Catalog. If you want
> to do a rough write-up or share a reference implementation that uses such
> an approach, I will be happy to brainstorm it more. Let us know!
>
> Here is the link to updated proposal
>
>
> <https://docs.google.com/document/d/1jr4Ah8oceOmo6fwxG_0II4vKDUHUKScb/edit?usp=sharing&ouid=100384647237395649950&rtpof=true&sd=true>
> Thanks Again!
> - Drew
>

Reply via email to