Hi community,

 I have updated the proposal with both the options (overwriting existing
timestamps-ms vs introducing a new sequence/timestamp field) as we have
initial consensus on using catalog authored sequence/timestamp. Jagdeep,
please review to ensure that the options are correctly captured. I have
also added additional arguments on why we can't assume timestamp to be
"informational" since it's being used in critical paths and
incorrect values can take the table offline.

Also, I'm moving the meeting to Thursday to better accommodate conflicts. I
would also record the meeting in case anyone misses and is interested in
the discussion.

Sync for iceberg multi-table transactions
Thursday, May 29 · 9:00 – 10:00am
Time zone: America/Los_Angeles
Google Meet joining info
Video call link: https://meet.google.com/ffc-ttjs-vti

Thanks,
Maninder



On Mon, May 26, 2025 at 12:47 AM Péter Váry <peter.vary.apa...@gmail.com>
wrote:

> I'm interested, but can't be there, but please record the meeting.
> Thanks,
> Peter
>
> Maninderjit Singh <parmar.maninder...@gmail.com> ezt írta (időpont: 2025.
> máj. 24., Szo, 2:30):
>
>> Hi dev community,
>> I was wondering if we could join a call next week for discussing the
>> multi-table transactions so we can make progress. I have shared a meeting
>> invite where anyone who's interested in the discussion can join. Please let
>> me know if this works.
>>
>> Thanks,
>> Maninder
>>
>> Sync for iceberg multi-table transactions
>> Friday, May 30 · 9:00 – 10:00am
>> Time zone: America/Los_Angeles
>> Google Meet joining info
>> Video call link: https://meet.google.com/ffc-ttjs-vti
>>
>>
>> On Wed, May 21, 2025 at 10:25 AM Maninderjit Singh <
>> parmar.maninder...@gmail.com> wrote:
>>
>>> Hi dev community,
>>> Following up on the thread here to continue the discussion and get
>>> feedback since we couldn't get to it in sync. I think we have made some
>>> progress in the discussion that I want to capture while highlighting the
>>> items where we need to create consensus along with pros and cons. I would
>>> need help to add clarity and to make sure the arguments are captured
>>> correctly.
>>>
>>> *Things we agree on*
>>>
>>>    1. Don't maintain server side state for tracking the transactions.
>>>    2. Need global (catalog-wide) ordering of snapshots via some
>>>    (hybrid/logical) clock/CSN
>>>    3. Optionally expose the catalog's clock/CSN information without
>>>    changing how tables load
>>>    4. Loading consistent snapshot across multiple tables and repeatable
>>>    reads based on the reference clock/CSN
>>>
>>>
>>> *Things we disagree on*
>>>
>>>    1. Reuse existing timestamp field vs introduce a new field CSN
>>>
>>>
>>> *Reusing timestamp field approach*
>>>
>>>    - Pros:
>>>
>>>
>>>    1. Backwards compatibility, no change to table metadata spec so
>>>    could be used by existing v2 tables.
>>>    2. Fixes existing time travel and ordering issues
>>>    3. Simplifies and clarifies the spec (no new id for snapshots)
>>>    4. Common notion of timestamp that could be used to evaluate causal
>>>    relationships in other proposals like events or commit reports.
>>>
>>>
>>>    - Cons
>>>
>>>
>>>    1. Unique timestamp generation in milliseconds. Potential
>>>    mitigations:
>>>    
>>> https://docs.google.com/document/d/1KVgUJc1WgftHfLz118vMbEE7HV8_pUDk4s-GJFDyAOE/edit?pli=1&disco=AAABjwaxXeg
>>>    2. Concerns about client side timestamp being overridden.
>>>
>>> *Adding new CSN field*
>>>
>>>    - Pros:
>>>
>>>
>>>    1. Flexibility to use logical or hybrid clocks. Not sure how clients
>>>    can generate a hybrid clock timestamp here without suffering from clock
>>>    skew (Would be good to clarify this)?
>>>    2. No client side overriding concerns.
>>>
>>>
>>>    - Cons:
>>>
>>>
>>>    1. Not backwards compatible, requires new field in table metadata so
>>>    need to wait for v4
>>>    2. Does not fix time travel and snapshot-log ordering issues
>>>    3. Adds another id for snapshots that clients need to generate and
>>>    reason about.
>>>    4. Could not be extended to use in other proposals for causal
>>>    reasoning.
>>>
>>>
>>> Thanks,
>>> Maninder
>>>
>>> On Tue, May 20, 2025 at 8:16 PM Maninderjit Singh <
>>> parmar.maninder...@gmail.com> wrote:
>>>
>>>> Appreciate the feedback on the "catalog-authored timestamp" document
>>>> <https://docs.google.com/document/d/1KVgUJc1WgftHfLz118vMbEE7HV8_pUDk4s-GJFDyAOE/edit?pli=1&tab=t.0>
>>>> !
>>>>
>>>> Ryan, I don't think we can get consistent time travel queries in
>>>> iceberg without fixing the timestamp field since it's what the spec
>>>> <https://iceberg.apache.org/spec/#point-in-time-reads-time-travel>
>>>> prescribes for time travel. Hence I took the liberty to re-use it for the
>>>> catalog timestamp which ensures that snapshot-log is correctly ordered for
>>>> time travel.  Additionally, the timestamp field needs to be fixed to avoid
>>>> breaking commits to the table due to accidental large skews as per current
>>>> spec, the scenario is described in detail here
>>>> <https://docs.google.com/document/d/1KVgUJc1WgftHfLz118vMbEE7HV8_pUDk4s-GJFDyAOE/edit?pli=1&tab=t.0#bookmark=id.6avx66vzo168>
>>>> .
>>>> The other benefit of reusing the timestamp field is spec simplicity and
>>>> clarity on timestamp generation responsibilities without requiring the need
>>>> to manage yet another identifier (in addition to sequence number, snapshot
>>>> id and timestamp) for snapshots.
>>>>
>>>> Jagdeep, your concerns about overriding the timestamp field are valid
>>>> but the reason I'm not too worried about it is because client can't assume
>>>> a commit is successful without their response being acknowledged by the
>>>> catalog which returns the CommitTableResponse
>>>> <https://github.com/apache/iceberg/blob/c2478968e65368c61799d8ca4b89506a61ca3e7c/open-api/rest-catalog-open-api.yaml#L3997>
>>>>  with
>>>> new metadata (that has catalog authored timestamps in the proposal). I'm
>>>> happy to work with you to put something common together and get the best
>>>> out of the proposals.
>>>>
>>>> Thanks,
>>>> Maninder
>>>>
>>>>
>>>>
>>>>
>>>> On Tue, May 20, 2025 at 5:48 PM Jagdeep Sidhu <sidhujagde...@gmail.com>
>>>> wrote:
>>>>
>>>>> Thank you Ryan, Maninder and the rest of the community for feedback
>>>>> and ideas!
>>>>> Drew and I will take another pass and remove the catalog co-ordination
>>>>> requirement for LoadTable API, and bring the proposal closer to
>>>>> "catalog-authored timestamp" in the sense that clients can use CSN to find
>>>>> the right snapshot, but still leave upto Catalog on what it want to use 
>>>>> for
>>>>> CSN (Hybrid clock timestamp or another monotonically increasing number).
>>>>>
>>>>> If more folks have feedback, please leave it in the doc or email list,
>>>>> so we can address it as well in the document update.
>>>>>
>>>>> Maninder, one reason we proposed a new field for CommitSequenceNumber
>>>>> instead of using an existing field is for backwards compatibility. 
>>>>> Catalogs
>>>>> can start optionally exposing the new field, and interested clients can 
>>>>> use
>>>>> the new field, but existing clients keep working as is. Existing and new
>>>>> clients can also keep working as is against the same tables in the
>>>>> same Catalog. My one worry is that having Catalog override the timestamp
>>>>> field for commits may break some existing clients? Today all Iceberg
>>>>> engines/clients do not expect the timestamp field in metadata/snapshot-log
>>>>> to be overwritten by the Catalog.
>>>>>
>>>>> How do you feel about taking the best from each proposal?, i.e.
>>>>> monotonically increasing commit sequence numbers (some catalogs can use
>>>>> timestamps, some can use logical clock but we don't have to enforce it -
>>>>> leave it up to Catalog), but keep client side logic for resolving the 
>>>>> right
>>>>> snapshot using sequence numbers instead of adding that functionality to
>>>>> Catalog. Let me know!
>>>>>
>>>>> Thank you!
>>>>> -Jagdeep
>>>>>
>>>>> On Tue, May 20, 2025 at 2:45 PM Ryan Blue <rdb...@gmail.com> wrote:
>>>>>
>>>>>> Thanks for the proposals! There are things that I think are good
>>>>>> about both of them. I think that the catalog-authored timestamps proposal
>>>>>> misunderstands the purpose of the timestamp field, but does get right 
>>>>>> that
>>>>>> a monotonically increasing "time" field (really a sequence number) across
>>>>>> tables enables the coordination needed for snapshot isolated reads. I 
>>>>>> like
>>>>>> that the sequence number proposal leaves the meaning of the field to the
>>>>>> catalog for coordination, but it still proposes catalog coordination by
>>>>>> loading tables "at" some sequence number. Ideally, we would be able to
>>>>>> (optionally) expose this extra catalog information to clients and not 
>>>>>> need
>>>>>> to change how loading works.
>>>>>>
>>>>>> Ryan
>>>>>>
>>>>>> On Tue, May 20, 2025 at 9:45 AM Ryan Blue <rdb...@gmail.com> wrote:
>>>>>>
>>>>>>> Hi everyone,
>>>>>>>
>>>>>>> To avoid passing copies of a file around for comments, I put the doc
>>>>>>> for commit sequence numbers into Google so we can comment on a central
>>>>>>> copy:
>>>>>>> https://docs.google.com/document/d/1jr4Ah8oceOmo6fwxG_0II4vKDUHUKScb/edit?usp=sharing&ouid=100239850723655533404&rtpof=true&sd=true
>>>>>>>
>>>>>>> Ryan
>>>>>>>
>>>>>>> On Fri, May 16, 2025 at 2:51 AM Maninderjit Singh <
>>>>>>> parmar.maninder...@gmail.com> wrote:
>>>>>>>
>>>>>>>> Thanks for the updated proposal Drew!
>>>>>>>> My preference for using the catalog authored timestamp is to
>>>>>>>> minimize changes to the REST spec so we can have good backwards
>>>>>>>> compatibility. I have quickly put together a draft proposal on how this
>>>>>>>> should work. Looking forward to feedback and discussion.
>>>>>>>>
>>>>>>>>  Draft Proposal: Catalog‑Authored Timestamps for Apache Iceberg
>>>>>>>> REST Catalog
>>>>>>>> <https://drive.google.com/open?id=1KVgUJc1WgftHfLz118vMbEE7HV8_pUDk4s-GJFDyAOE>
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Maninder
>>>>>>>>
>>>>>>>> On Wed, May 14, 2025 at 6:12 PM Drew <img...@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Hi everyone,
>>>>>>>>>
>>>>>>>>> Thank you for feedback on the MTT proposal and during community
>>>>>>>>> sync. Based on it, Jagdeep and I have iterated on the document and 
>>>>>>>>> added a
>>>>>>>>> second option to use *Catalog CommitSequenceNumbers*. Looking
>>>>>>>>> forward to getting more feedback on the proposal, where to add more 
>>>>>>>>> details
>>>>>>>>> or approach/changes to consider. We appreciate everyone's time on 
>>>>>>>>> this!
>>>>>>>>>
>>>>>>>>> The option introduces *Catalog CommitSequenceNumbers(CSNs)*,
>>>>>>>>> which allow clients/engines to read a consistent view of multiple 
>>>>>>>>> tables
>>>>>>>>> without needing to register a transaction context with the catalog. 
>>>>>>>>> This
>>>>>>>>> removes the need of registering a transaction context with Catalog, 
>>>>>>>>> thus
>>>>>>>>> removing the need of transaction bookkeeping on the catalog side. For
>>>>>>>>> aborting transactions early, clients can use LoadTable with and 
>>>>>>>>> without CSN
>>>>>>>>> to figure out if there is already a conflicting write on any of the 
>>>>>>>>> tables
>>>>>>>>> being modified. Also removed the section where transactions were 
>>>>>>>>> staging
>>>>>>>>> commits on Catalog, and changed the proposal to align with Eduard's PR
>>>>>>>>> around staging changes locally before commit (
>>>>>>>>> https://github.com/apache/iceberg/pull/6948).
>>>>>>>>>
>>>>>>>>> Jagdeep also clarified in an example in a previous email where a
>>>>>>>>> workload may require multi table snapshot isolation, even if the 
>>>>>>>>> tables are
>>>>>>>>> being updated without Multi-Table commit API. Though most MTT 
>>>>>>>>> transactions
>>>>>>>>> will commit using the multi table commit API.
>>>>>>>>>
>>>>>>>>> Maninder, for the approach of "common notion of time between
>>>>>>>>> clients and catalog" - I spent some time thinking about it, but 
>>>>>>>>> cannot find
>>>>>>>>> a feasible way to do this. Yes, the catalogs can use a high precision
>>>>>>>>> clock, but clients cannot use Catalog Timestamp from API calls to set 
>>>>>>>>> local
>>>>>>>>> clock due to network latency for request/response. For example, 
>>>>>>>>> different
>>>>>>>>> requests to the same Catalog servers can return different timestamps 
>>>>>>>>> based
>>>>>>>>> on network latency. Also what if a client works with more than 1 
>>>>>>>>> Catalog.
>>>>>>>>> If you want to do a rough write-up or share a reference 
>>>>>>>>> implementation that
>>>>>>>>> uses such an approach, I will be happy to brainstorm it more. Let us 
>>>>>>>>> know!
>>>>>>>>>
>>>>>>>>> Here is the link to updated proposal
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> <https://docs.google.com/document/d/1jr4Ah8oceOmo6fwxG_0II4vKDUHUKScb/edit?usp=sharing&ouid=100384647237395649950&rtpof=true&sd=true>
>>>>>>>>> Thanks Again!
>>>>>>>>> - Drew
>>>>>>>>>
>>>>>>>>

Reply via email to