> One thing that I think has changed since the initial proposal are column level updates. Have we considered the interaction between these two features?
Micah, that is a great call out. I think the *_last_updated_sequence_number* and *_last_updated_timestamp* should work the same way regarding column updates. They should reflect the latest snapshot of either the base file or the newest column file. On Tue, May 12, 2026 at 1:22 PM Micah Kornfield <[email protected]> wrote: > #2 is more involved and should probably be done after the v4 metadata tree >> (spec <https://github.com/apache/iceberg/pull/16025> and impl >> <https://github.com/orgs/apache/projects/605/views/1>) is mostly >> complete, as we want to plumb inheritance through only for the v4 tables. > > > One thing that I think has changed since the initial proposal are column > level updates. Have we considered the interaction between these two > features? > > Thanks, > Micah > > On Mon, May 11, 2026 at 6:30 PM Steven Wu <[email protected]> wrote: > >> Circling back on this topic, since we have consensus on the direction. It >> essentially has two parts >> >> 1. monotonic snapshot timestamp for v4 tables >> 2. row timestamp inherited from snapshot timestamp for v4 tables >> >> >> #1 is an isolated and small change. So I created the following PRs: >> * spec: https://github.com/apache/iceberg/pull/16294 >> * impl: https://github.com/apache/iceberg/pull/16293 >> >> #2 is more involved and should probably be done after the v4 metadata >> tree (spec <https://github.com/apache/iceberg/pull/16025> and impl >> <https://github.com/orgs/apache/projects/605/views/1>) is mostly >> complete, as we want to plumb inheritance through only for the v4 tables. >> >> >> >> On Mon, Jan 26, 2026 at 10:05 AM Russell Spitzer < >> [email protected]> wrote: >> >>> Sounds good to me >>> >>> On Mon, Jan 26, 2026 at 11:59 AM Anton Okolnychyi <[email protected]> >>> wrote: >>> >>>> Cool, sounds like a plan then? Thanks for answering all the questions, >>>> Steven! >>>> >>>> чт, 22 січ. 2026 р. о 18:29 Steven Wu <[email protected]> пише: >>>> >>>>> For row timestamp inheritance to work, I would need to implement the >>>>> plumbing. So I would imagine existing rows would have null values because >>>>> the inheritance plumbing was not there yet. This would be consistent with >>>>> upgrade behavior for the V3 row lineage: >>>>> https://iceberg.apache.org/spec/#row-lineage-for-upgraded-tables. >>>>> >>>>> On Thu, Jan 22, 2026 at 4:09 PM Anton Okolnychyi < >>>>> [email protected]> wrote: >>>>> >>>>>> Also, do we have a concrete plan for how to handle tables that would >>>>>> be upgraded to V4? What timestamp will we assign to existing rows? >>>>>> >>>>>> On Wed, Jan 21, 2026 at 3:59 PM Anton Okolnychyi < >>>>>> [email protected]> wrote: >>>>>> >>>>>>> If we ignore temporal queries that need strict snapshot boundaries >>>>>>> and can't be solved completely using row timestamps in case of >>>>>>> mutations, >>>>>>> you mentioned other use cases when row timestamps may be helpful like >>>>>>> TTL >>>>>>> and auditing. We can debate whether using CURRENT_TIMESTAMP() is enough >>>>>>> for >>>>>>> them, but I don't really see a point given that we already have row >>>>>>> lineage >>>>>>> in V3 and the storage overhead for one more field isn't likely to be >>>>>>> noticable. One of the problems with CURRENT_TIMESTAMP() is the required >>>>>>> action by the user. Having a reliable row timestamp populated >>>>>>> automatically >>>>>>> is likely to be better, so +1. >>>>>>> >>>>>>> пт, 16 січ. 2026 р. о 14:30 Steven Wu <[email protected]> пише: >>>>>>> >>>>>>>> Joining with snapshot history also has significant complexity. It >>>>>>>> requires retaining the entire snapshot history with probably trimmed >>>>>>>> snapshot metadata. There are concerns on the size of the snapshot >>>>>>>> history >>>>>>>> for tables with frequent commits (like streaming ingestion). Do we >>>>>>>> maintain >>>>>>>> the unbounded trimmed snapshot history in the same table metadata, >>>>>>>> which >>>>>>>> could affect table metadata.json size? or store it separately somewhere >>>>>>>> (like in catalog), which would require the complexity of multi-entity >>>>>>>> transaction in catalog? >>>>>>>> >>>>>>>> >>>>>>>> On Fri, Jan 16, 2026 at 12:07 PM Russell Spitzer < >>>>>>>> [email protected]> wrote: >>>>>>>> >>>>>>>>> I've gone back and forth on the inherited columns. I think the >>>>>>>>> thing which keeps coming back to me is that I don't >>>>>>>>> like that the only way to determine the timestamp associated with >>>>>>>>> a row update/creation is to do a join back >>>>>>>>> against table metadata. While that's doable, It feels user >>>>>>>>> unfriendly. >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Fri, Jan 16, 2026 at 11:54 AM Steven Wu <[email protected]> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Anton, you are right that the row-level deletes will be a problem >>>>>>>>>> for some of the mentioned use cases (like incremental processing). I >>>>>>>>>> have >>>>>>>>>> clarified the applicability of some use cases to "tables with >>>>>>>>>> inserts and >>>>>>>>>> updates only". >>>>>>>>>> >>>>>>>>>> Right now, we are only tracking modification/commit time (not >>>>>>>>>> insertion time) in case of updates. >>>>>>>>>> >>>>>>>>>> On Thu, Jan 15, 2026 at 6:33 PM Anton Okolnychyi < >>>>>>>>>> [email protected]> wrote: >>>>>>>>>> >>>>>>>>>>> I think there is clear consensus that making snapshot timestamps >>>>>>>>>>> strictly increasing is a positive thing. I am also +1. >>>>>>>>>>> >>>>>>>>>>> - How will row timestamps allow us to reliably implement >>>>>>>>>>> incremental consumption independent of the snapshot retention given >>>>>>>>>>> that >>>>>>>>>>> rows can be added AND removed in a particular time frame? How can we >>>>>>>>>>> capture all changes by just looking at the latest snapshot? >>>>>>>>>>> - Some use cases in the doc need the insertion time and some >>>>>>>>>>> need the last modification time. Do we plan to support both? >>>>>>>>>>> - What do we expect the behavior to be in UPDATE and MERGE >>>>>>>>>>> operations? >>>>>>>>>>> >>>>>>>>>>> To be clear: I am not opposed to this change, just want to make >>>>>>>>>>> sure I understand all use cases that we aim to address and what >>>>>>>>>>> would be >>>>>>>>>>> required in engines. >>>>>>>>>>> >>>>>>>>>>> чт, 15 січ. 2026 р. о 17:01 Maninder Parmar < >>>>>>>>>>> [email protected]> пише: >>>>>>>>>>> >>>>>>>>>>>> +1 for improving how the commit timestamps are >>>>>>>>>>>> assigned monotonically since this requirement has emerged over >>>>>>>>>>>> multiple >>>>>>>>>>>> discussions like notifications, multi-table transactions, time >>>>>>>>>>>> travel >>>>>>>>>>>> accuracy and row timestamps. It would be good to have a single >>>>>>>>>>>> consistent >>>>>>>>>>>> way to represent and assign timestamps that could be leveraged >>>>>>>>>>>> across >>>>>>>>>>>> multiple features. >>>>>>>>>>>> >>>>>>>>>>>> On Thu, Jan 15, 2026 at 4:05 PM Ryan Blue <[email protected]> >>>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Yeah, to add my perspective on that discussion, I think my >>>>>>>>>>>>> primary concern is that people expect timestamps to be monotonic >>>>>>>>>>>>> and if >>>>>>>>>>>>> they aren't then a `_last_update_timestamp` field just makes the >>>>>>>>>>>>> problem >>>>>>>>>>>>> worse. But it is _nice_ to have row-level timestamps. So I would >>>>>>>>>>>>> be okay if >>>>>>>>>>>>> we revisit how we assign commit timestamps and improve it so that >>>>>>>>>>>>> you get >>>>>>>>>>>>> monotonic behavior. >>>>>>>>>>>>> >>>>>>>>>>>>> On Thu, Jan 15, 2026 at 2:23 PM Steven Wu < >>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> We had an offline discussion with Ryan. I revised the >>>>>>>>>>>>>> proposal as follows. >>>>>>>>>>>>>> >>>>>>>>>>>>>> 1. V4 would require writers to generate *monotonic* snapshot >>>>>>>>>>>>>> timestamps. The proposal doc has a section that describes a >>>>>>>>>>>>>> recommended >>>>>>>>>>>>>> implementation using lamport timestamps. >>>>>>>>>>>>>> 2. Expose *last_update_timestamp* metadata column that >>>>>>>>>>>>>> inherits from snapshot timestamp >>>>>>>>>>>>>> >>>>>>>>>>>>>> This is a relatively low-friction change that can fix the >>>>>>>>>>>>>> time travel problem and enable use cases like latency tracking, >>>>>>>>>>>>>> temporal >>>>>>>>>>>>>> query, TTL, auditing. >>>>>>>>>>>>>> >>>>>>>>>>>>>> There is no accuracy requirement on the timestamp values. In >>>>>>>>>>>>>> practice, modern servers with NTP have pretty reliable wall >>>>>>>>>>>>>> clocks. E.g., >>>>>>>>>>>>>> Java library implemented this validation >>>>>>>>>>>>>> <https://github.com/apache/iceberg/blob/035e0fb39d2a949f6343552ade0a7d6c2967e0db/core/src/main/java/org/apache/iceberg/TableMetadata.java#L369-L377> >>>>>>>>>>>>>> that >>>>>>>>>>>>>> protects against backward clock drift up to one minute for >>>>>>>>>>>>>> snapshot >>>>>>>>>>>>>> timestamps. Don't think we have heard many complaints of commit >>>>>>>>>>>>>> failure due >>>>>>>>>>>>>> to that clock drift validation. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Would appreciate feedback on the revised proposal. >>>>>>>>>>>>>> >>>>>>>>>>>>>> https://docs.google.com/document/d/1cXr_RwEO6o66S8vR7k3NM8-bJ9tH2rkh4vSdMXNC8J8/edit?tab=t.0 >>>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>> Steven >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Tue, Jan 13, 2026 at 8:40 PM Anton Okolnychyi < >>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Steven, I was referring to the fact that CURRENT_TIMESTAMP() >>>>>>>>>>>>>>> is usually evaluated quite early in engines so we could >>>>>>>>>>>>>>> theoretically have >>>>>>>>>>>>>>> another expression closer to the commit time. You are right, >>>>>>>>>>>>>>> though, it >>>>>>>>>>>>>>> won't be the actual commit time given that we have to write it >>>>>>>>>>>>>>> into the >>>>>>>>>>>>>>> files. Also, I don't think generating a timestamp for a row as >>>>>>>>>>>>>>> it is being >>>>>>>>>>>>>>> written is going to be beneficial. To sum up, expression-based >>>>>>>>>>>>>>> defaults >>>>>>>>>>>>>>> would allow us to capture the time the transaction or write >>>>>>>>>>>>>>> starts, but not >>>>>>>>>>>>>>> the actual commit time. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Russell, if the goal is to know what happened to the table >>>>>>>>>>>>>>> in a given time frame, isn't the changelog scan the way to go? >>>>>>>>>>>>>>> It would >>>>>>>>>>>>>>> assign commit ordinals based on lineage and include row-level >>>>>>>>>>>>>>> diffs. How >>>>>>>>>>>>>>> would you be able to determine changes with row timestamps by >>>>>>>>>>>>>>> just looking >>>>>>>>>>>>>>> at the latest snapshot? >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> It does seem promising to make snapshot timestamps strictly >>>>>>>>>>>>>>> increasing to avoid ambiguity during time travel. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> вт, 13 січ. 2026 р. о 16:33 Ryan Blue <[email protected]> >>>>>>>>>>>>>>> пише: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> > Whether or not "t" is an atomic clock time is not as >>>>>>>>>>>>>>>> important as the query between time bounds making sense. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I'm not sure I get it then. If we want monotonically >>>>>>>>>>>>>>>> increasing times, but they don't have to be real times then >>>>>>>>>>>>>>>> how do you know >>>>>>>>>>>>>>>> what notion of "time" you care about for these filters? Or to >>>>>>>>>>>>>>>> put it >>>>>>>>>>>>>>>> another way, how do you know that your "before" and "after" >>>>>>>>>>>>>>>> times are >>>>>>>>>>>>>>>> reasonable? If the boundaries of these time queries can move >>>>>>>>>>>>>>>> around a bit, >>>>>>>>>>>>>>>> by how much? >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> It seems to me that row IDs can play an important role here >>>>>>>>>>>>>>>> because you have the order guarantee that we seem to want for >>>>>>>>>>>>>>>> this use >>>>>>>>>>>>>>>> case: if snapshot A was committed before snapshot B, then the >>>>>>>>>>>>>>>> rows from A >>>>>>>>>>>>>>>> have row IDs that are always less than the rows IDs of B. The >>>>>>>>>>>>>>>> problem is >>>>>>>>>>>>>>>> that we don't know where those row IDs start and end once A >>>>>>>>>>>>>>>> and B are no >>>>>>>>>>>>>>>> longer tracked. Using a "timestamp" seems to work, but I still >>>>>>>>>>>>>>>> worry that >>>>>>>>>>>>>>>> without reliable timestamps that correspond with some >>>>>>>>>>>>>>>> guarantee to real >>>>>>>>>>>>>>>> timestamps, we are creating a feature that seems reliable but >>>>>>>>>>>>>>>> isn't. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I'm somewhat open to the idea of introducing a snapshot >>>>>>>>>>>>>>>> timestamp that the catalog guarantees is monotonically >>>>>>>>>>>>>>>> increasing. But if >>>>>>>>>>>>>>>> we did that, wouldn't we still need to know the association >>>>>>>>>>>>>>>> between these >>>>>>>>>>>>>>>> timestamps and snapshots after the snapshot metadata expires? >>>>>>>>>>>>>>>> My mental >>>>>>>>>>>>>>>> model is that this would be used to look for data that >>>>>>>>>>>>>>>> arrived, say, 3 >>>>>>>>>>>>>>>> weeks ago on Dec 24th. Since the snapshots metadata is no >>>>>>>>>>>>>>>> longer around we >>>>>>>>>>>>>>>> could use the row timestamp to find those rows. But how do we >>>>>>>>>>>>>>>> know that the >>>>>>>>>>>>>>>> snapshot timestamps correspond to the actual timestamp range >>>>>>>>>>>>>>>> of Dec 24th? >>>>>>>>>>>>>>>> Is it just "close enough" as long as we don't have out of >>>>>>>>>>>>>>>> order timestamps? >>>>>>>>>>>>>>>> This is what I mean by needing to keep track of the >>>>>>>>>>>>>>>> association between >>>>>>>>>>>>>>>> timestamps and snapshots after the metadata expires. Seems >>>>>>>>>>>>>>>> like you either >>>>>>>>>>>>>>>> need to keep track of what the catalog's clock was for events >>>>>>>>>>>>>>>> you care >>>>>>>>>>>>>>>> about, or you don't really care about exact timestamps. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Tue, Jan 13, 2026 at 2:22 PM Russell Spitzer < >>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> The key goal here is the ability to answer the question >>>>>>>>>>>>>>>>> "what happened to the table in some time window. (before < t >>>>>>>>>>>>>>>>> < after)?" >>>>>>>>>>>>>>>>> Whether or not "t" is an atomic clock time is not as >>>>>>>>>>>>>>>>> important as the query between time bounds making sense. >>>>>>>>>>>>>>>>> Downstream applications (from what I know) are mostly >>>>>>>>>>>>>>>>> sensitive to getting discrete and well defined answers to >>>>>>>>>>>>>>>>> this question like: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> 1 < t < 2 should be exclusive of >>>>>>>>>>>>>>>>> 2 < t < 3 should be exclusive of >>>>>>>>>>>>>>>>> 3 < t < 4 >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> And the union of these should be the same as the query >>>>>>>>>>>>>>>>> asking for 1 < t < 4 >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Currently this is not possible because we have no >>>>>>>>>>>>>>>>> guarantee of ordering in our timestamps >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Snapshots >>>>>>>>>>>>>>>>> A -> B -> C >>>>>>>>>>>>>>>>> Sequence numbers >>>>>>>>>>>>>>>>> 50 -> 51 -> 52 >>>>>>>>>>>>>>>>> Timestamp >>>>>>>>>>>>>>>>> 3 -> 1 -> 2 >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> This makes time travel always a little wrong to start with. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> The Java implementation only allows one minute of negative >>>>>>>>>>>>>>>>> time on commit so we actually kind of do have this as a >>>>>>>>>>>>>>>>> "light monotonicity" requirement but as noted above there >>>>>>>>>>>>>>>>> is no spec requirement for this. While we do have sequence >>>>>>>>>>>>>>>>> number and row id, we still don't have a stable way of >>>>>>>>>>>>>>>>> associating these with a consistent time in an engine >>>>>>>>>>>>>>>>> independent way. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Ideally we just want to have one consistent way of >>>>>>>>>>>>>>>>> answering the question "what did the table look like at time >>>>>>>>>>>>>>>>> t" >>>>>>>>>>>>>>>>> which I think we get by adding in a new field that is a >>>>>>>>>>>>>>>>> timestamp, set by the Catalog close to commit time, >>>>>>>>>>>>>>>>> that always goes up. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> I'm not sure we can really do this with an engine >>>>>>>>>>>>>>>>> expression since they won't know when the data is actually >>>>>>>>>>>>>>>>> committed >>>>>>>>>>>>>>>>> when writing files? >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On Tue, Jan 13, 2026 at 3:35 PM Anton Okolnychyi < >>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> This seems like a lot of new complexity in the format. I >>>>>>>>>>>>>>>>>> would like us to explore whether we can build the considered >>>>>>>>>>>>>>>>>> use cases on >>>>>>>>>>>>>>>>>> top of expression-based defaults instead. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> We already plan to support CURRENT_TIMESTAMP() and >>>>>>>>>>>>>>>>>> similar functions that are part of the SQL standard >>>>>>>>>>>>>>>>>> definition for default >>>>>>>>>>>>>>>>>> values. This would provide us a way to know the relative row >>>>>>>>>>>>>>>>>> order. True, >>>>>>>>>>>>>>>>>> this usually will represent the start of the operation. We >>>>>>>>>>>>>>>>>> may define >>>>>>>>>>>>>>>>>> COMMIT_TIMESTAMP() or a similar expression for the actual >>>>>>>>>>>>>>>>>> commit time, if >>>>>>>>>>>>>>>>>> there are use cases that need that. Plus, we may explore an >>>>>>>>>>>>>>>>>> approach >>>>>>>>>>>>>>>>>> similar to MySQL that allows users to reset the default >>>>>>>>>>>>>>>>>> value on update. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> - Anton >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> вт, 13 січ. 2026 р. о 11:04 Russell Spitzer < >>>>>>>>>>>>>>>>>> [email protected]> пише: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> I think this is the right step forward. Our current >>>>>>>>>>>>>>>>>>> "timestamp" definition is too ambiguous to be useful so >>>>>>>>>>>>>>>>>>> establishing >>>>>>>>>>>>>>>>>>> a well defined and monotonic timestamp could be really >>>>>>>>>>>>>>>>>>> great. I also like the ability for row's to know this value >>>>>>>>>>>>>>>>>>> without >>>>>>>>>>>>>>>>>>> having to rely on snapshot information which can be >>>>>>>>>>>>>>>>>>> expired. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> On Mon, Jan 12, 2026 at 11:03 AM Steven Wu < >>>>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Hi all, >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> I have revised the row timestamp proposal with the >>>>>>>>>>>>>>>>>>>> following changes. >>>>>>>>>>>>>>>>>>>> * a new commit_timestamp field in snapshot metadata >>>>>>>>>>>>>>>>>>>> that has nano-second precision. >>>>>>>>>>>>>>>>>>>> * this optional field is only set by the REST catalog >>>>>>>>>>>>>>>>>>>> server >>>>>>>>>>>>>>>>>>>> * it needs to be monotonic (e.g. implemented using >>>>>>>>>>>>>>>>>>>> Lamport timestamp) >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> https://docs.google.com/document/d/1cXr_RwEO6o66S8vR7k3NM8-bJ9tH2rkh4vSdMXNC8J8/edit?tab=t.0#heading=h.efdngoizchuh >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>> Steven >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> On Fri, Dec 12, 2025 at 2:36 PM Steven Wu < >>>>>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Thanks for the clarification, Ryan. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> For long-running streaming jobs that commit >>>>>>>>>>>>>>>>>>>>> periodically, it is difficult to establish the constant >>>>>>>>>>>>>>>>>>>>> value of >>>>>>>>>>>>>>>>>>>>> current_timestamp across all writer tasks for each commit >>>>>>>>>>>>>>>>>>>>> cycle. I guess >>>>>>>>>>>>>>>>>>>>> streaming writers may just need to write the wall clock >>>>>>>>>>>>>>>>>>>>> time when appending >>>>>>>>>>>>>>>>>>>>> a row to a data file for the default value of >>>>>>>>>>>>>>>>>>>>> current_timestamp. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> On Fri, Dec 12, 2025 at 1:44 PM Ryan Blue < >>>>>>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> I don't think that every row would have a different >>>>>>>>>>>>>>>>>>>>>> value. That would be up to the engine, but I would >>>>>>>>>>>>>>>>>>>>>> expect engines to insert >>>>>>>>>>>>>>>>>>>>>> `CURRENT_TIMESTAMP` into the plan and then replace it >>>>>>>>>>>>>>>>>>>>>> with a constant, >>>>>>>>>>>>>>>>>>>>>> resulting in a consistent value for all rows. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> You're right that this would not necessarily be the >>>>>>>>>>>>>>>>>>>>>> commit time. But neither is the commit timestamp from >>>>>>>>>>>>>>>>>>>>>> Iceberg's snapshot. >>>>>>>>>>>>>>>>>>>>>> I'm not sure how we are going to define "good enough" >>>>>>>>>>>>>>>>>>>>>> for this purpose. I >>>>>>>>>>>>>>>>>>>>>> think at least `CURRENT_TIMESTAMP` has reliable and >>>>>>>>>>>>>>>>>>>>>> known behavior when you >>>>>>>>>>>>>>>>>>>>>> look at how it is handled in engines. And if you want >>>>>>>>>>>>>>>>>>>>>> the Iceberg >>>>>>>>>>>>>>>>>>>>>> timestamp, then use a periodic query of the snapshot >>>>>>>>>>>>>>>>>>>>>> stable to keep track >>>>>>>>>>>>>>>>>>>>>> of them in a table you can join to. I don't think this >>>>>>>>>>>>>>>>>>>>>> rises to the need >>>>>>>>>>>>>>>>>>>>>> for a table feature unless we can guarantee that it is >>>>>>>>>>>>>>>>>>>>>> correct. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> On Fri, Dec 12, 2025 at 1:19 PM Steven Wu < >>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> > Postgres `current_timestamp` captures the >>>>>>>>>>>>>>>>>>>>>>> transaction start time [1, 2]. Should we extend the >>>>>>>>>>>>>>>>>>>>>>> same semantic to >>>>>>>>>>>>>>>>>>>>>>> Iceberg: all rows added in the same snapshot should >>>>>>>>>>>>>>>>>>>>>>> have the same timestamp >>>>>>>>>>>>>>>>>>>>>>> value? >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> Let me clarify my last comment. >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> created_at TIMESTAMP WITH TIME ZONE DEFAULT >>>>>>>>>>>>>>>>>>>>>>> CURRENT_TIMESTAMP) >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> Since Postgres current_timestamp captures the >>>>>>>>>>>>>>>>>>>>>>> transaction start time, all rows added in the same >>>>>>>>>>>>>>>>>>>>>>> insert transaction would >>>>>>>>>>>>>>>>>>>>>>> have the same value as the transaction timestamp with >>>>>>>>>>>>>>>>>>>>>>> the column >>>>>>>>>>>>>>>>>>>>>>> definition above. >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> If we extend a similar semantic to Iceberg, all rows >>>>>>>>>>>>>>>>>>>>>>> added in the same Iceberg transaction/snapshot should >>>>>>>>>>>>>>>>>>>>>>> have the same >>>>>>>>>>>>>>>>>>>>>>> timestamp? >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> Ryan, I understand your comment for using >>>>>>>>>>>>>>>>>>>>>>> current_timestamp expression as column default value, >>>>>>>>>>>>>>>>>>>>>>> you were thinking >>>>>>>>>>>>>>>>>>>>>>> that the engine would set the column value to the wall >>>>>>>>>>>>>>>>>>>>>>> clock time when >>>>>>>>>>>>>>>>>>>>>>> appending a row to a data file, right? every row would >>>>>>>>>>>>>>>>>>>>>>> almost have a >>>>>>>>>>>>>>>>>>>>>>> different timestamp value. >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> On Fri, Dec 12, 2025 at 10:26 AM Steven Wu < >>>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> `current_timestamp` expression may not always carry >>>>>>>>>>>>>>>>>>>>>>>> the right semantic for the use cases. E.g., latency >>>>>>>>>>>>>>>>>>>>>>>> tracking is interested >>>>>>>>>>>>>>>>>>>>>>>> in when records are added / committed to the table, >>>>>>>>>>>>>>>>>>>>>>>> not when the record was >>>>>>>>>>>>>>>>>>>>>>>> appended to an uncommitted data file in the processing >>>>>>>>>>>>>>>>>>>>>>>> engine. >>>>>>>>>>>>>>>>>>>>>>>> Record creation and Iceberg commit can be minutes or >>>>>>>>>>>>>>>>>>>>>>>> even hours apart. >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> Row timestamp inherited from snapshot timestamp has >>>>>>>>>>>>>>>>>>>>>>>> no overhead with the initial commit and has very >>>>>>>>>>>>>>>>>>>>>>>> minimal storage overhead >>>>>>>>>>>>>>>>>>>>>>>> during file rewrite. Per-row current_timestamp would >>>>>>>>>>>>>>>>>>>>>>>> have distinct values >>>>>>>>>>>>>>>>>>>>>>>> for every row and has more storage overhead. >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> OLTP databases deal with small row-level >>>>>>>>>>>>>>>>>>>>>>>> transactions. Postgres `current_timestamp` captures >>>>>>>>>>>>>>>>>>>>>>>> the transaction start >>>>>>>>>>>>>>>>>>>>>>>> time [1, 2]. Should we extend the same semantic to >>>>>>>>>>>>>>>>>>>>>>>> Iceberg: all rows added >>>>>>>>>>>>>>>>>>>>>>>> in the same snapshot should have the same timestamp >>>>>>>>>>>>>>>>>>>>>>>> value? >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> [1] >>>>>>>>>>>>>>>>>>>>>>>> https://www.postgresql.org/docs/current/functions-datetime.html >>>>>>>>>>>>>>>>>>>>>>>> [2] >>>>>>>>>>>>>>>>>>>>>>>> https://neon.com/postgresql/postgresql-date-functions/postgresql-current_timestamp >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> On Thu, Dec 11, 2025 at 4:07 PM Micah Kornfield < >>>>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> Micah, are 1 and 2 the same? 3 is covered by this >>>>>>>>>>>>>>>>>>>>>>>>>> proposal. >>>>>>>>>>>>>>>>>>>>>>>>>> To support the created_by timestamp, we would >>>>>>>>>>>>>>>>>>>>>>>>>> need to implement the following row lineage behavior >>>>>>>>>>>>>>>>>>>>>>>>>> * Initially, it inherits from the snapshot >>>>>>>>>>>>>>>>>>>>>>>>>> timestamp >>>>>>>>>>>>>>>>>>>>>>>>>> * during rewrite (like compaction), it should be >>>>>>>>>>>>>>>>>>>>>>>>>> persisted into data files. >>>>>>>>>>>>>>>>>>>>>>>>>> * during update, it needs to be carried over from >>>>>>>>>>>>>>>>>>>>>>>>>> the previous row. This is similar to the row_id >>>>>>>>>>>>>>>>>>>>>>>>>> carry over for row updates. >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> Sorry for the short hand. These are not the same: >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> 1. Insertion time - time the row was inserted. >>>>>>>>>>>>>>>>>>>>>>>>> 2. Create by - The system that created the record. >>>>>>>>>>>>>>>>>>>>>>>>> 3. Updated by - The system that last updated the >>>>>>>>>>>>>>>>>>>>>>>>> record. >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> Depending on the exact use-case these might or >>>>>>>>>>>>>>>>>>>>>>>>> might not have utility. I'm just wondering if there >>>>>>>>>>>>>>>>>>>>>>>>> will be more example >>>>>>>>>>>>>>>>>>>>>>>>> like this in the future. >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> created_by column would incur likely significantly >>>>>>>>>>>>>>>>>>>>>>>>>> higher storage overhead compared to the updated_by >>>>>>>>>>>>>>>>>>>>>>>>>> column. As rows are >>>>>>>>>>>>>>>>>>>>>>>>>> updated overtime, the cardinality for this column in >>>>>>>>>>>>>>>>>>>>>>>>>> data files can be >>>>>>>>>>>>>>>>>>>>>>>>>> high. Hence, the created_by column may not compress >>>>>>>>>>>>>>>>>>>>>>>>>> well. This is a similar >>>>>>>>>>>>>>>>>>>>>>>>>> problem for the row_id column. One side effect of >>>>>>>>>>>>>>>>>>>>>>>>>> enabling row lineage by >>>>>>>>>>>>>>>>>>>>>>>>>> default for V3 tables is the storage overhead of >>>>>>>>>>>>>>>>>>>>>>>>>> row_id column after >>>>>>>>>>>>>>>>>>>>>>>>>> compaction especially for narrow tables with few >>>>>>>>>>>>>>>>>>>>>>>>>> columns. >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> I agree. I think this analysis also shows that >>>>>>>>>>>>>>>>>>>>>>>>> some consumers of Iceberg might not necessarily want >>>>>>>>>>>>>>>>>>>>>>>>> to have all these >>>>>>>>>>>>>>>>>>>>>>>>> columns, so we might want to make them configurable, >>>>>>>>>>>>>>>>>>>>>>>>> rather than mandating >>>>>>>>>>>>>>>>>>>>>>>>> them for all tables. Ryan's thought on default values >>>>>>>>>>>>>>>>>>>>>>>>> seems like it would >>>>>>>>>>>>>>>>>>>>>>>>> solve the issues I was raising. >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>>>>>>> Micah >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Dec 11, 2025 at 3:47 PM Ryan Blue < >>>>>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> > An explicit timestamp column adds more burden >>>>>>>>>>>>>>>>>>>>>>>>>> to application developers. While some databases >>>>>>>>>>>>>>>>>>>>>>>>>> require an explicit column >>>>>>>>>>>>>>>>>>>>>>>>>> in the schema, those databases provide triggers to >>>>>>>>>>>>>>>>>>>>>>>>>> auto set the column >>>>>>>>>>>>>>>>>>>>>>>>>> value. For Iceberg, the snapshot timestamp is the >>>>>>>>>>>>>>>>>>>>>>>>>> closest to the trigger >>>>>>>>>>>>>>>>>>>>>>>>>> timestamp. >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> Since the use cases don't require an exact >>>>>>>>>>>>>>>>>>>>>>>>>> timestamp, this seems like the best solution to get >>>>>>>>>>>>>>>>>>>>>>>>>> what people want (an >>>>>>>>>>>>>>>>>>>>>>>>>> insertion timestamp) that has clear and well-defined >>>>>>>>>>>>>>>>>>>>>>>>>> behavior. Since >>>>>>>>>>>>>>>>>>>>>>>>>> `current_timestamp` is defined by the SQL spec, it >>>>>>>>>>>>>>>>>>>>>>>>>> makes sense to me that >>>>>>>>>>>>>>>>>>>>>>>>>> we could use it and have reasonable behavior. >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> I've talked with Anton about this before and >>>>>>>>>>>>>>>>>>>>>>>>>> maybe he'll jump in on this thread. I think that we >>>>>>>>>>>>>>>>>>>>>>>>>> may need to extend >>>>>>>>>>>>>>>>>>>>>>>>>> default values to include default value expressions, >>>>>>>>>>>>>>>>>>>>>>>>>> like >>>>>>>>>>>>>>>>>>>>>>>>>> `current_timestamp` that is allowed by the SQL spec. >>>>>>>>>>>>>>>>>>>>>>>>>> That would solve the >>>>>>>>>>>>>>>>>>>>>>>>>> problem as well as some others (like `current_date` >>>>>>>>>>>>>>>>>>>>>>>>>> or `current_user`) and >>>>>>>>>>>>>>>>>>>>>>>>>> would not create a potentially misleading (and >>>>>>>>>>>>>>>>>>>>>>>>>> heavyweight) timestamp >>>>>>>>>>>>>>>>>>>>>>>>>> feature in the format. >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> > Also some environments may have stronger clock >>>>>>>>>>>>>>>>>>>>>>>>>> service, like Spanner TrueTime service. >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> Even in cases like this, commit retries can >>>>>>>>>>>>>>>>>>>>>>>>>> reorder commits and make timestamps out of order. I >>>>>>>>>>>>>>>>>>>>>>>>>> don't think that we >>>>>>>>>>>>>>>>>>>>>>>>>> should be making guarantees or even exposing >>>>>>>>>>>>>>>>>>>>>>>>>> metadata that people might >>>>>>>>>>>>>>>>>>>>>>>>>> mistake as having those guarantees. >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> On Tue, Dec 9, 2025 at 2:22 PM Steven Wu < >>>>>>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> Ryan, thanks a lot for the feedback! >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> Regarding the concern for reliable timestamps, >>>>>>>>>>>>>>>>>>>>>>>>>>> we are not proposing using timestamps for ordering. >>>>>>>>>>>>>>>>>>>>>>>>>>> With NTP in modern >>>>>>>>>>>>>>>>>>>>>>>>>>> computers, they are generally reliable enough for >>>>>>>>>>>>>>>>>>>>>>>>>>> the intended use cases. >>>>>>>>>>>>>>>>>>>>>>>>>>> Also some environments may have stronger clock >>>>>>>>>>>>>>>>>>>>>>>>>>> service, like Spanner >>>>>>>>>>>>>>>>>>>>>>>>>>> TrueTime service >>>>>>>>>>>>>>>>>>>>>>>>>>> <https://docs.cloud.google.com/spanner/docs/true-time-external-consistency> >>>>>>>>>>>>>>>>>>>>>>>>>>> . >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> > joining to timestamps from the snapshots >>>>>>>>>>>>>>>>>>>>>>>>>>> metadata table. >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> As you also mentioned, it depends on the >>>>>>>>>>>>>>>>>>>>>>>>>>> snapshot history, which is often retained for a few >>>>>>>>>>>>>>>>>>>>>>>>>>> days due to performance >>>>>>>>>>>>>>>>>>>>>>>>>>> reasons. >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> > embedding a timestamp in DML (like >>>>>>>>>>>>>>>>>>>>>>>>>>> `current_timestamp`) rather than relying on an >>>>>>>>>>>>>>>>>>>>>>>>>>> implicit one from table >>>>>>>>>>>>>>>>>>>>>>>>>>> metadata. >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> An explicit timestamp column adds more burden to >>>>>>>>>>>>>>>>>>>>>>>>>>> application developers. While some databases >>>>>>>>>>>>>>>>>>>>>>>>>>> require an explicit column in >>>>>>>>>>>>>>>>>>>>>>>>>>> the schema, those databases provide triggers to >>>>>>>>>>>>>>>>>>>>>>>>>>> auto set the column value. >>>>>>>>>>>>>>>>>>>>>>>>>>> For Iceberg, the snapshot timestamp is the closest >>>>>>>>>>>>>>>>>>>>>>>>>>> to the trigger timestamp. >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> Also, the timestamp set during computation (like >>>>>>>>>>>>>>>>>>>>>>>>>>> streaming ingestion or relative long batch >>>>>>>>>>>>>>>>>>>>>>>>>>> computation) doesn't capture the >>>>>>>>>>>>>>>>>>>>>>>>>>> time the rows/files are added to the Iceberg table >>>>>>>>>>>>>>>>>>>>>>>>>>> in a batch fashion. >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> > And for those use cases, you could also keep a >>>>>>>>>>>>>>>>>>>>>>>>>>> longer history of snapshot timestamps, like storing >>>>>>>>>>>>>>>>>>>>>>>>>>> a catalog's event log >>>>>>>>>>>>>>>>>>>>>>>>>>> for long-term access to timestamp info >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> this is not really consumable by joining the >>>>>>>>>>>>>>>>>>>>>>>>>>> regular table query with catalog event log. I would >>>>>>>>>>>>>>>>>>>>>>>>>>> also imagine catalog >>>>>>>>>>>>>>>>>>>>>>>>>>> event log is capped at shorter retention (maybe a >>>>>>>>>>>>>>>>>>>>>>>>>>> few months) compared to >>>>>>>>>>>>>>>>>>>>>>>>>>> data retention (could be a few years). >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> On Tue, Dec 9, 2025 at 1:32 PM Ryan Blue < >>>>>>>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> I don't think it is a good idea to expose >>>>>>>>>>>>>>>>>>>>>>>>>>>> timestamps at the row level. Timestamps in >>>>>>>>>>>>>>>>>>>>>>>>>>>> metadata that would be carried >>>>>>>>>>>>>>>>>>>>>>>>>>>> down to the row level already confuse people that >>>>>>>>>>>>>>>>>>>>>>>>>>>> expect them to be useful >>>>>>>>>>>>>>>>>>>>>>>>>>>> or reliable, rather than for debugging. I think >>>>>>>>>>>>>>>>>>>>>>>>>>>> extending this to the row >>>>>>>>>>>>>>>>>>>>>>>>>>>> level would only make the problem worse. >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> You can already get this information by >>>>>>>>>>>>>>>>>>>>>>>>>>>> projecting the last updated sequence number, which >>>>>>>>>>>>>>>>>>>>>>>>>>>> is reliable, and joining >>>>>>>>>>>>>>>>>>>>>>>>>>>> to timestamps from the snapshots metadata table. >>>>>>>>>>>>>>>>>>>>>>>>>>>> Of course, the drawback >>>>>>>>>>>>>>>>>>>>>>>>>>>> there is losing the timestamp information when >>>>>>>>>>>>>>>>>>>>>>>>>>>> snapshots expire, but since >>>>>>>>>>>>>>>>>>>>>>>>>>>> it isn't reliable anyway I'd be fine with that. >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> Some of the use cases, like auditing and >>>>>>>>>>>>>>>>>>>>>>>>>>>> compliance, are probably better served by >>>>>>>>>>>>>>>>>>>>>>>>>>>> embedding a timestamp in DML >>>>>>>>>>>>>>>>>>>>>>>>>>>> (like `current_timestamp`) rather than relying on >>>>>>>>>>>>>>>>>>>>>>>>>>>> an implicit one from >>>>>>>>>>>>>>>>>>>>>>>>>>>> table metadata. And for those use cases, you could >>>>>>>>>>>>>>>>>>>>>>>>>>>> also keep a longer >>>>>>>>>>>>>>>>>>>>>>>>>>>> history of snapshot timestamps, like storing a >>>>>>>>>>>>>>>>>>>>>>>>>>>> catalog's event log for >>>>>>>>>>>>>>>>>>>>>>>>>>>> long-term access to timestamp info. I think that >>>>>>>>>>>>>>>>>>>>>>>>>>>> would be better than >>>>>>>>>>>>>>>>>>>>>>>>>>>> storing it at the row level. >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> On Mon, Dec 8, 2025 at 3:46 PM Steven Wu < >>>>>>>>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> For V4 spec, I have a small proposal [1] to >>>>>>>>>>>>>>>>>>>>>>>>>>>>> expose the row timestamp concept that can help >>>>>>>>>>>>>>>>>>>>>>>>>>>>> with many use cases like >>>>>>>>>>>>>>>>>>>>>>>>>>>>> temporal queries, latency tracking, TTL, auditing >>>>>>>>>>>>>>>>>>>>>>>>>>>>> and compliance. >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> This *_last_updated_timestamp_ms * metadata >>>>>>>>>>>>>>>>>>>>>>>>>>>>> column behaves very similarly to the >>>>>>>>>>>>>>>>>>>>>>>>>>>>> *_last_updated_sequence_number* for row >>>>>>>>>>>>>>>>>>>>>>>>>>>>> lineage. >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> - Initially, it inherits from the snapshot >>>>>>>>>>>>>>>>>>>>>>>>>>>>> timestamp. >>>>>>>>>>>>>>>>>>>>>>>>>>>>> - During rewrite (like compaction), its >>>>>>>>>>>>>>>>>>>>>>>>>>>>> values are persisted in the data files. >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Would love to hear what you think. >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Steven >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> [1] >>>>>>>>>>>>>>>>>>>>>>>>>>>>> https://docs.google.com/document/d/1cXr_RwEO6o66S8vR7k3NM8-bJ9tH2rkh4vSdMXNC8J8/edit?usp=sharing >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
