Re: Discuss proposal - IRC APIs for Multi-Statement Multi-Table Transactions

Maninderjit Singh Fri, 23 May 2025 17:29:37 -0700

Hi dev community,
I was wondering if we could join a call next week for discussing the
multi-table transactions so we can make progress. I have shared a meeting
invite where anyone who's interested in the discussion can join. Please let
me know if this works.


Thanks,
Maninder

Sync for iceberg multi-table transactions
Friday, May 30 · 9:00 – 10:00am
Time zone: America/Los_Angeles
Google Meet joining info
Video call link: https://meet.google.com/ffc-ttjs-vti


On Wed, May 21, 2025 at 10:25 AM Maninderjit Singh <
parmar.maninder...@gmail.com> wrote:

> Hi dev community,
> Following up on the thread here to continue the discussion and get
> feedback since we couldn't get to it in sync. I think we have made some
> progress in the discussion that I want to capture while highlighting the
> items where we need to create consensus along with pros and cons. I would
> need help to add clarity and to make sure the arguments are captured
> correctly.
>
> *Things we agree on*
>
>    1. Don't maintain server side state for tracking the transactions.
>    2. Need global (catalog-wide) ordering of snapshots via some
>    (hybrid/logical) clock/CSN
>    3. Optionally expose the catalog's clock/CSN information without
>    changing how tables load
>    4. Loading consistent snapshot across multiple tables and repeatable
>    reads based on the reference clock/CSN
>
>
> *Things we disagree on*
>
>    1. Reuse existing timestamp field vs introduce a new field CSN
>
>
> *Reusing timestamp field approach*
>
>    - Pros:
>
>
>    1. Backwards compatibility, no change to table metadata spec so could
>    be used by existing v2 tables.
>    2. Fixes existing time travel and ordering issues
>    3. Simplifies and clarifies the spec (no new id for snapshots)
>    4. Common notion of timestamp that could be used to evaluate causal
>    relationships in other proposals like events or commit reports.
>
>
>    - Cons
>
>
>    1. Unique timestamp generation in milliseconds. Potential mitigations:
>    
> https://docs.google.com/document/d/1KVgUJc1WgftHfLz118vMbEE7HV8_pUDk4s-GJFDyAOE/edit?pli=1&disco=AAABjwaxXeg
>    2. Concerns about client side timestamp being overridden.
>
> *Adding new CSN field*
>
>    - Pros:
>
>
>    1. Flexibility to use logical or hybrid clocks. Not sure how clients
>    can generate a hybrid clock timestamp here without suffering from clock
>    skew (Would be good to clarify this)?
>    2. No client side overriding concerns.
>
>
>    - Cons:
>
>
>    1. Not backwards compatible, requires new field in table metadata so
>    need to wait for v4
>    2. Does not fix time travel and snapshot-log ordering issues
>    3. Adds another id for snapshots that clients need to generate and
>    reason about.
>    4. Could not be extended to use in other proposals for causal
>    reasoning.
>
>
> Thanks,
> Maninder
>
> On Tue, May 20, 2025 at 8:16 PM Maninderjit Singh <
> parmar.maninder...@gmail.com> wrote:
>
>> Appreciate the feedback on the "catalog-authored timestamp" document
>> <https://docs.google.com/document/d/1KVgUJc1WgftHfLz118vMbEE7HV8_pUDk4s-GJFDyAOE/edit?pli=1&tab=t.0>
>> !
>>
>> Ryan, I don't think we can get consistent time travel queries in iceberg
>> without fixing the timestamp field since it's what the spec
>> <https://iceberg.apache.org/spec/#point-in-time-reads-time-travel>
>> prescribes for time travel. Hence I took the liberty to re-use it for the
>> catalog timestamp which ensures that snapshot-log is correctly ordered for
>> time travel.  Additionally, the timestamp field needs to be fixed to avoid
>> breaking commits to the table due to accidental large skews as per current
>> spec, the scenario is described in detail here
>> <https://docs.google.com/document/d/1KVgUJc1WgftHfLz118vMbEE7HV8_pUDk4s-GJFDyAOE/edit?pli=1&tab=t.0#bookmark=id.6avx66vzo168>
>> .
>> The other benefit of reusing the timestamp field is spec simplicity and
>> clarity on timestamp generation responsibilities without requiring the need
>> to manage yet another identifier (in addition to sequence number, snapshot
>> id and timestamp) for snapshots.
>>
>> Jagdeep, your concerns about overriding the timestamp field are valid but
>> the reason I'm not too worried about it is because client can't assume a
>> commit is successful without their response being acknowledged by the
>> catalog which returns the CommitTableResponse
>> <https://github.com/apache/iceberg/blob/c2478968e65368c61799d8ca4b89506a61ca3e7c/open-api/rest-catalog-open-api.yaml#L3997>
>>  with
>> new metadata (that has catalog authored timestamps in the proposal). I'm
>> happy to work with you to put something common together and get the best
>> out of the proposals.
>>
>> Thanks,
>> Maninder
>>
>>
>>
>>
>> On Tue, May 20, 2025 at 5:48 PM Jagdeep Sidhu <sidhujagde...@gmail.com>
>> wrote:
>>
>>> Thank you Ryan, Maninder and the rest of the community for feedback and
>>> ideas!
>>> Drew and I will take another pass and remove the catalog co-ordination
>>> requirement for LoadTable API, and bring the proposal closer to
>>> "catalog-authored timestamp" in the sense that clients can use CSN to find
>>> the right snapshot, but still leave upto Catalog on what it want to use for
>>> CSN (Hybrid clock timestamp or another monotonically increasing number).
>>>
>>> If more folks have feedback, please leave it in the doc or email list,
>>> so we can address it as well in the document update.
>>>
>>> Maninder, one reason we proposed a new field for CommitSequenceNumber
>>> instead of using an existing field is for backwards compatibility. Catalogs
>>> can start optionally exposing the new field, and interested clients can use
>>> the new field, but existing clients keep working as is. Existing and new
>>> clients can also keep working as is against the same tables in the
>>> same Catalog. My one worry is that having Catalog override the timestamp
>>> field for commits may break some existing clients? Today all Iceberg
>>> engines/clients do not expect the timestamp field in metadata/snapshot-log
>>> to be overwritten by the Catalog.
>>>
>>> How do you feel about taking the best from each proposal?, i.e.
>>> monotonically increasing commit sequence numbers (some catalogs can use
>>> timestamps, some can use logical clock but we don't have to enforce it -
>>> leave it up to Catalog), but keep client side logic for resolving the right
>>> snapshot using sequence numbers instead of adding that functionality to
>>> Catalog. Let me know!
>>>
>>> Thank you!
>>> -Jagdeep
>>>
>>> On Tue, May 20, 2025 at 2:45 PM Ryan Blue <rdb...@gmail.com> wrote:
>>>
>>>> Thanks for the proposals! There are things that I think are good about
>>>> both of them. I think that the catalog-authored timestamps proposal
>>>> misunderstands the purpose of the timestamp field, but does get right that
>>>> a monotonically increasing "time" field (really a sequence number) across
>>>> tables enables the coordination needed for snapshot isolated reads. I like
>>>> that the sequence number proposal leaves the meaning of the field to the
>>>> catalog for coordination, but it still proposes catalog coordination by
>>>> loading tables "at" some sequence number. Ideally, we would be able to
>>>> (optionally) expose this extra catalog information to clients and not need
>>>> to change how loading works.
>>>>
>>>> Ryan
>>>>
>>>> On Tue, May 20, 2025 at 9:45 AM Ryan Blue <rdb...@gmail.com> wrote:
>>>>
>>>>> Hi everyone,
>>>>>
>>>>> To avoid passing copies of a file around for comments, I put the doc
>>>>> for commit sequence numbers into Google so we can comment on a central
>>>>> copy:
>>>>> https://docs.google.com/document/d/1jr4Ah8oceOmo6fwxG_0II4vKDUHUKScb/edit?usp=sharing&ouid=100239850723655533404&rtpof=true&sd=true
>>>>>
>>>>> Ryan
>>>>>
>>>>> On Fri, May 16, 2025 at 2:51 AM Maninderjit Singh <
>>>>> parmar.maninder...@gmail.com> wrote:
>>>>>
>>>>>> Thanks for the updated proposal Drew!
>>>>>> My preference for using the catalog authored timestamp is to minimize
>>>>>> changes to the REST spec so we can have good backwards compatibility. I
>>>>>> have quickly put together a draft proposal on how this should work. 
>>>>>> Looking
>>>>>> forward to feedback and discussion.
>>>>>>
>>>>>>  Draft Proposal: Catalog‑Authored Timestamps for Apache Iceberg REST
>>>>>> Catalog
>>>>>> <https://drive.google.com/open?id=1KVgUJc1WgftHfLz118vMbEE7HV8_pUDk4s-GJFDyAOE>
>>>>>>
>>>>>> Thanks,
>>>>>> Maninder
>>>>>>
>>>>>> On Wed, May 14, 2025 at 6:12 PM Drew <img...@gmail.com> wrote:
>>>>>>
>>>>>>> Hi everyone,
>>>>>>>
>>>>>>> Thank you for feedback on the MTT proposal and during community
>>>>>>> sync. Based on it, Jagdeep and I have iterated on the document and 
>>>>>>> added a
>>>>>>> second option to use *Catalog CommitSequenceNumbers*. Looking
>>>>>>> forward to getting more feedback on the proposal, where to add more 
>>>>>>> details
>>>>>>> or approach/changes to consider. We appreciate everyone's time on this!
>>>>>>>
>>>>>>> The option introduces *Catalog CommitSequenceNumbers(CSNs)*, which
>>>>>>> allow clients/engines to read a consistent view of multiple tables 
>>>>>>> without
>>>>>>> needing to register a transaction context with the catalog. This removes
>>>>>>> the need of registering a transaction context with Catalog, thus 
>>>>>>> removing
>>>>>>> the need of transaction bookkeeping on the catalog side. For aborting
>>>>>>> transactions early, clients can use LoadTable with and without CSN to
>>>>>>> figure out if there is already a conflicting write on any of the tables
>>>>>>> being modified. Also removed the section where transactions were staging
>>>>>>> commits on Catalog, and changed the proposal to align with Eduard's PR
>>>>>>> around staging changes locally before commit (
>>>>>>> https://github.com/apache/iceberg/pull/6948).
>>>>>>>
>>>>>>> Jagdeep also clarified in an example in a previous email where a
>>>>>>> workload may require multi table snapshot isolation, even if the tables 
>>>>>>> are
>>>>>>> being updated without Multi-Table commit API. Though most MTT 
>>>>>>> transactions
>>>>>>> will commit using the multi table commit API.
>>>>>>>
>>>>>>> Maninder, for the approach of "common notion of time between clients
>>>>>>> and catalog" - I spent some time thinking about it, but cannot find a
>>>>>>> feasible way to do this. Yes, the catalogs can use a high precision 
>>>>>>> clock,
>>>>>>> but clients cannot use Catalog Timestamp from API calls to set local 
>>>>>>> clock
>>>>>>> due to network latency for request/response. For example, different
>>>>>>> requests to the same Catalog servers can return different timestamps 
>>>>>>> based
>>>>>>> on network latency. Also what if a client works with more than 1 
>>>>>>> Catalog.
>>>>>>> If you want to do a rough write-up or share a reference implementation 
>>>>>>> that
>>>>>>> uses such an approach, I will be happy to brainstorm it more. Let us 
>>>>>>> know!
>>>>>>>
>>>>>>> Here is the link to updated proposal
>>>>>>>
>>>>>>>
>>>>>>> <https://docs.google.com/document/d/1jr4Ah8oceOmo6fwxG_0II4vKDUHUKScb/edit?usp=sharing&ouid=100384647237395649950&rtpof=true&sd=true>
>>>>>>> Thanks Again!
>>>>>>> - Drew
>>>>>>>
>>>>>>

Re: Discuss proposal - IRC APIs for Multi-Statement Multi-Table Transactions

Reply via email to