I've gone back and forth on the inherited columns. I think the thing which keeps coming back to me is that I don't like that the only way to determine the timestamp associated with a row update/creation is to do a join back against table metadata. While that's doable, It feels user unfriendly.
On Fri, Jan 16, 2026 at 11:54 AM Steven Wu <[email protected]> wrote: > Anton, you are right that the row-level deletes will be a problem for some > of the mentioned use cases (like incremental processing). I have > clarified the applicability of some use cases to "tables with inserts and > updates only". > > Right now, we are only tracking modification/commit time (not insertion > time) in case of updates. > > On Thu, Jan 15, 2026 at 6:33 PM Anton Okolnychyi <[email protected]> > wrote: > >> I think there is clear consensus that making snapshot timestamps strictly >> increasing is a positive thing. I am also +1. >> >> - How will row timestamps allow us to reliably implement incremental >> consumption independent of the snapshot retention given that rows can be >> added AND removed in a particular time frame? How can we capture all >> changes by just looking at the latest snapshot? >> - Some use cases in the doc need the insertion time and some need the >> last modification time. Do we plan to support both? >> - What do we expect the behavior to be in UPDATE and MERGE operations? >> >> To be clear: I am not opposed to this change, just want to make sure I >> understand all use cases that we aim to address and what would be required >> in engines. >> >> чт, 15 січ. 2026 р. о 17:01 Maninder Parmar <[email protected]> >> пише: >> >>> +1 for improving how the commit timestamps are assigned monotonically >>> since this requirement has emerged over multiple discussions like >>> notifications, multi-table transactions, time travel accuracy and row >>> timestamps. It would be good to have a single consistent way to represent >>> and assign timestamps that could be leveraged across multiple features. >>> >>> On Thu, Jan 15, 2026 at 4:05 PM Ryan Blue <[email protected]> wrote: >>> >>>> Yeah, to add my perspective on that discussion, I think my primary >>>> concern is that people expect timestamps to be monotonic and if they aren't >>>> then a `_last_update_timestamp` field just makes the problem worse. But it >>>> is _nice_ to have row-level timestamps. So I would be okay if we revisit >>>> how we assign commit timestamps and improve it so that you get monotonic >>>> behavior. >>>> >>>> On Thu, Jan 15, 2026 at 2:23 PM Steven Wu <[email protected]> wrote: >>>> >>>>> We had an offline discussion with Ryan. I revised the proposal as >>>>> follows. >>>>> >>>>> 1. V4 would require writers to generate *monotonic* snapshot >>>>> timestamps. The proposal doc has a section that describes a recommended >>>>> implementation using lamport timestamps. >>>>> 2. Expose *last_update_timestamp* metadata column that inherits from >>>>> snapshot timestamp >>>>> >>>>> This is a relatively low-friction change that can fix the time travel >>>>> problem and enable use cases like latency tracking, temporal query, TTL, >>>>> auditing. >>>>> >>>>> There is no accuracy requirement on the timestamp values. In practice, >>>>> modern servers with NTP have pretty reliable wall clocks. E.g., Java >>>>> library implemented this validation >>>>> <https://github.com/apache/iceberg/blob/035e0fb39d2a949f6343552ade0a7d6c2967e0db/core/src/main/java/org/apache/iceberg/TableMetadata.java#L369-L377> >>>>> that >>>>> protects against backward clock drift up to one minute for snapshot >>>>> timestamps. Don't think we have heard many complaints of commit failure >>>>> due >>>>> to that clock drift validation. >>>>> >>>>> Would appreciate feedback on the revised proposal. >>>>> >>>>> https://docs.google.com/document/d/1cXr_RwEO6o66S8vR7k3NM8-bJ9tH2rkh4vSdMXNC8J8/edit?tab=t.0 >>>>> >>>>> Thanks, >>>>> Steven >>>>> >>>>> On Tue, Jan 13, 2026 at 8:40 PM Anton Okolnychyi < >>>>> [email protected]> wrote: >>>>> >>>>>> Steven, I was referring to the fact that CURRENT_TIMESTAMP() is >>>>>> usually evaluated quite early in engines so we could theoretically have >>>>>> another expression closer to the commit time. You are right, though, it >>>>>> won't be the actual commit time given that we have to write it into the >>>>>> files. Also, I don't think generating a timestamp for a row as it is >>>>>> being >>>>>> written is going to be beneficial. To sum up, expression-based defaults >>>>>> would allow us to capture the time the transaction or write starts, but >>>>>> not >>>>>> the actual commit time. >>>>>> >>>>>> Russell, if the goal is to know what happened to the table in a given >>>>>> time frame, isn't the changelog scan the way to go? It would assign >>>>>> commit >>>>>> ordinals based on lineage and include row-level diffs. How would you be >>>>>> able to determine changes with row timestamps by just looking at the >>>>>> latest >>>>>> snapshot? >>>>>> >>>>>> It does seem promising to make snapshot timestamps strictly >>>>>> increasing to avoid ambiguity during time travel. >>>>>> >>>>>> вт, 13 січ. 2026 р. о 16:33 Ryan Blue <[email protected]> пише: >>>>>> >>>>>>> > Whether or not "t" is an atomic clock time is not as important as >>>>>>> the query between time bounds making sense. >>>>>>> >>>>>>> I'm not sure I get it then. If we want monotonically increasing >>>>>>> times, but they don't have to be real times then how do you know what >>>>>>> notion of "time" you care about for these filters? Or to put it another >>>>>>> way, how do you know that your "before" and "after" times are >>>>>>> reasonable? >>>>>>> If the boundaries of these time queries can move around a bit, by how >>>>>>> much? >>>>>>> >>>>>>> It seems to me that row IDs can play an important role here because >>>>>>> you have the order guarantee that we seem to want for this use case: if >>>>>>> snapshot A was committed before snapshot B, then the rows from A have >>>>>>> row >>>>>>> IDs that are always less than the rows IDs of B. The problem is that we >>>>>>> don't know where those row IDs start and end once A and B are no longer >>>>>>> tracked. Using a "timestamp" seems to work, but I still worry that >>>>>>> without >>>>>>> reliable timestamps that correspond with some guarantee to real >>>>>>> timestamps, >>>>>>> we are creating a feature that seems reliable but isn't. >>>>>>> >>>>>>> I'm somewhat open to the idea of introducing a snapshot timestamp >>>>>>> that the catalog guarantees is monotonically increasing. But if we did >>>>>>> that, wouldn't we still need to know the association between these >>>>>>> timestamps and snapshots after the snapshot metadata expires? My mental >>>>>>> model is that this would be used to look for data that arrived, say, 3 >>>>>>> weeks ago on Dec 24th. Since the snapshots metadata is no longer around >>>>>>> we >>>>>>> could use the row timestamp to find those rows. But how do we know that >>>>>>> the >>>>>>> snapshot timestamps correspond to the actual timestamp range of Dec >>>>>>> 24th? >>>>>>> Is it just "close enough" as long as we don't have out of order >>>>>>> timestamps? >>>>>>> This is what I mean by needing to keep track of the association between >>>>>>> timestamps and snapshots after the metadata expires. Seems like you >>>>>>> either >>>>>>> need to keep track of what the catalog's clock was for events you care >>>>>>> about, or you don't really care about exact timestamps. >>>>>>> >>>>>>> On Tue, Jan 13, 2026 at 2:22 PM Russell Spitzer < >>>>>>> [email protected]> wrote: >>>>>>> >>>>>>>> The key goal here is the ability to answer the question "what >>>>>>>> happened to the table in some time window. (before < t < after)?" >>>>>>>> Whether or not "t" is an atomic clock time is not as important as >>>>>>>> the query between time bounds making sense. >>>>>>>> Downstream applications (from what I know) are mostly sensitive to >>>>>>>> getting discrete and well defined answers to >>>>>>>> this question like: >>>>>>>> >>>>>>>> 1 < t < 2 should be exclusive of >>>>>>>> 2 < t < 3 should be exclusive of >>>>>>>> 3 < t < 4 >>>>>>>> >>>>>>>> And the union of these should be the same as the query asking for 1 >>>>>>>> < t < 4 >>>>>>>> >>>>>>>> Currently this is not possible because we have no guarantee of >>>>>>>> ordering in our timestamps >>>>>>>> >>>>>>>> Snapshots >>>>>>>> A -> B -> C >>>>>>>> Sequence numbers >>>>>>>> 50 -> 51 -> 52 >>>>>>>> Timestamp >>>>>>>> 3 -> 1 -> 2 >>>>>>>> >>>>>>>> This makes time travel always a little wrong to start with. >>>>>>>> >>>>>>>> The Java implementation only allows one minute of negative time on >>>>>>>> commit so we actually kind of do have this as a >>>>>>>> "light monotonicity" requirement but as noted above there is no >>>>>>>> spec requirement for this. While we do have sequence >>>>>>>> number and row id, we still don't have a stable way of associating >>>>>>>> these with a consistent time in an engine independent way. >>>>>>>> >>>>>>>> Ideally we just want to have one consistent way of answering the >>>>>>>> question "what did the table look like at time t" >>>>>>>> which I think we get by adding in a new field that is a timestamp, >>>>>>>> set by the Catalog close to commit time, >>>>>>>> that always goes up. >>>>>>>> >>>>>>>> I'm not sure we can really do this with an engine expression since >>>>>>>> they won't know when the data is actually committed >>>>>>>> when writing files? >>>>>>>> >>>>>>>> >>>>>>>> On Tue, Jan 13, 2026 at 3:35 PM Anton Okolnychyi < >>>>>>>> [email protected]> wrote: >>>>>>>> >>>>>>>>> This seems like a lot of new complexity in the format. I would >>>>>>>>> like us to explore whether we can build the considered use cases on >>>>>>>>> top of >>>>>>>>> expression-based defaults instead. >>>>>>>>> >>>>>>>>> We already plan to support CURRENT_TIMESTAMP() and similar >>>>>>>>> functions that are part of the SQL standard definition for default >>>>>>>>> values. >>>>>>>>> This would provide us a way to know the relative row order. True, this >>>>>>>>> usually will represent the start of the operation. We may define >>>>>>>>> COMMIT_TIMESTAMP() or a similar expression for the actual commit >>>>>>>>> time, if >>>>>>>>> there are use cases that need that. Plus, we may explore an approach >>>>>>>>> similar to MySQL that allows users to reset the default value on >>>>>>>>> update. >>>>>>>>> >>>>>>>>> - Anton >>>>>>>>> >>>>>>>>> вт, 13 січ. 2026 р. о 11:04 Russell Spitzer < >>>>>>>>> [email protected]> пише: >>>>>>>>> >>>>>>>>>> I think this is the right step forward. Our current "timestamp" >>>>>>>>>> definition is too ambiguous to be useful so establishing >>>>>>>>>> a well defined and monotonic timestamp could be really great. I >>>>>>>>>> also like the ability for row's to know this value without >>>>>>>>>> having to rely on snapshot information which can be expired. >>>>>>>>>> >>>>>>>>>> On Mon, Jan 12, 2026 at 11:03 AM Steven Wu <[email protected]> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> Hi all, >>>>>>>>>>> >>>>>>>>>>> I have revised the row timestamp proposal with the following >>>>>>>>>>> changes. >>>>>>>>>>> * a new commit_timestamp field in snapshot metadata that has >>>>>>>>>>> nano-second precision. >>>>>>>>>>> * this optional field is only set by the REST catalog server >>>>>>>>>>> * it needs to be monotonic (e.g. implemented using Lamport >>>>>>>>>>> timestamp) >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> https://docs.google.com/document/d/1cXr_RwEO6o66S8vR7k3NM8-bJ9tH2rkh4vSdMXNC8J8/edit?tab=t.0#heading=h.efdngoizchuh >>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> Steven >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Fri, Dec 12, 2025 at 2:36 PM Steven Wu <[email protected]> >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>>> Thanks for the clarification, Ryan. >>>>>>>>>>>> >>>>>>>>>>>> For long-running streaming jobs that commit periodically, it is >>>>>>>>>>>> difficult to establish the constant value of current_timestamp >>>>>>>>>>>> across all >>>>>>>>>>>> writer tasks for each commit cycle. I guess streaming writers may >>>>>>>>>>>> just need >>>>>>>>>>>> to write the wall clock time when appending a row to a data file >>>>>>>>>>>> for the >>>>>>>>>>>> default value of current_timestamp. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Fri, Dec 12, 2025 at 1:44 PM Ryan Blue <[email protected]> >>>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> I don't think that every row would have a different value. >>>>>>>>>>>>> That would be up to the engine, but I would expect engines to >>>>>>>>>>>>> insert >>>>>>>>>>>>> `CURRENT_TIMESTAMP` into the plan and then replace it with a >>>>>>>>>>>>> constant, >>>>>>>>>>>>> resulting in a consistent value for all rows. >>>>>>>>>>>>> >>>>>>>>>>>>> You're right that this would not necessarily be the commit >>>>>>>>>>>>> time. But neither is the commit timestamp from Iceberg's >>>>>>>>>>>>> snapshot. I'm not >>>>>>>>>>>>> sure how we are going to define "good enough" for this purpose. I >>>>>>>>>>>>> think at >>>>>>>>>>>>> least `CURRENT_TIMESTAMP` has reliable and known behavior when >>>>>>>>>>>>> you look at >>>>>>>>>>>>> how it is handled in engines. And if you want the Iceberg >>>>>>>>>>>>> timestamp, then >>>>>>>>>>>>> use a periodic query of the snapshot stable to keep track of them >>>>>>>>>>>>> in a >>>>>>>>>>>>> table you can join to. I don't think this rises to the need for a >>>>>>>>>>>>> table >>>>>>>>>>>>> feature unless we can guarantee that it is correct. >>>>>>>>>>>>> >>>>>>>>>>>>> On Fri, Dec 12, 2025 at 1:19 PM Steven Wu < >>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> > Postgres `current_timestamp` captures the transaction start >>>>>>>>>>>>>> time [1, 2]. Should we extend the same semantic to Iceberg: all >>>>>>>>>>>>>> rows added >>>>>>>>>>>>>> in the same snapshot should have the same timestamp value? >>>>>>>>>>>>>> >>>>>>>>>>>>>> Let me clarify my last comment. >>>>>>>>>>>>>> >>>>>>>>>>>>>> created_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP) >>>>>>>>>>>>>> >>>>>>>>>>>>>> Since Postgres current_timestamp captures the transaction >>>>>>>>>>>>>> start time, all rows added in the same insert transaction would >>>>>>>>>>>>>> have the >>>>>>>>>>>>>> same value as the transaction timestamp with the column >>>>>>>>>>>>>> definition above. >>>>>>>>>>>>>> >>>>>>>>>>>>>> If we extend a similar semantic to Iceberg, all rows added in >>>>>>>>>>>>>> the same Iceberg transaction/snapshot should have the same >>>>>>>>>>>>>> timestamp? >>>>>>>>>>>>>> >>>>>>>>>>>>>> Ryan, I understand your comment for using current_timestamp >>>>>>>>>>>>>> expression as column default value, you were thinking that the >>>>>>>>>>>>>> engine would >>>>>>>>>>>>>> set the column value to the wall clock time when appending a row >>>>>>>>>>>>>> to a data >>>>>>>>>>>>>> file, right? every row would almost have a different timestamp >>>>>>>>>>>>>> value. >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Fri, Dec 12, 2025 at 10:26 AM Steven Wu < >>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> `current_timestamp` expression may not always carry the >>>>>>>>>>>>>>> right semantic for the use cases. E.g., latency tracking is >>>>>>>>>>>>>>> interested in >>>>>>>>>>>>>>> when records are added / committed to the table, not when the >>>>>>>>>>>>>>> record was >>>>>>>>>>>>>>> appended to an uncommitted data file in the processing engine. >>>>>>>>>>>>>>> Record creation and Iceberg commit can be minutes or even hours >>>>>>>>>>>>>>> apart. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Row timestamp inherited from snapshot timestamp has no >>>>>>>>>>>>>>> overhead with the initial commit and has very minimal storage >>>>>>>>>>>>>>> overhead >>>>>>>>>>>>>>> during file rewrite. Per-row current_timestamp would have >>>>>>>>>>>>>>> distinct values >>>>>>>>>>>>>>> for every row and has more storage overhead. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> OLTP databases deal with small row-level transactions. >>>>>>>>>>>>>>> Postgres `current_timestamp` captures the transaction start >>>>>>>>>>>>>>> time [1, 2]. >>>>>>>>>>>>>>> Should we extend the same semantic to Iceberg: all rows added >>>>>>>>>>>>>>> in the same >>>>>>>>>>>>>>> snapshot should have the same timestamp value? >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> [1] >>>>>>>>>>>>>>> https://www.postgresql.org/docs/current/functions-datetime.html >>>>>>>>>>>>>>> [2] >>>>>>>>>>>>>>> https://neon.com/postgresql/postgresql-date-functions/postgresql-current_timestamp >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Thu, Dec 11, 2025 at 4:07 PM Micah Kornfield < >>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Micah, are 1 and 2 the same? 3 is covered by this proposal. >>>>>>>>>>>>>>>>> To support the created_by timestamp, we would need to >>>>>>>>>>>>>>>>> implement the following row lineage behavior >>>>>>>>>>>>>>>>> * Initially, it inherits from the snapshot timestamp >>>>>>>>>>>>>>>>> * during rewrite (like compaction), it should be persisted >>>>>>>>>>>>>>>>> into data files. >>>>>>>>>>>>>>>>> * during update, it needs to be carried over from the >>>>>>>>>>>>>>>>> previous row. This is similar to the row_id carry over for >>>>>>>>>>>>>>>>> row updates. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Sorry for the short hand. These are not the same: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> 1. Insertion time - time the row was inserted. >>>>>>>>>>>>>>>> 2. Create by - The system that created the record. >>>>>>>>>>>>>>>> 3. Updated by - The system that last updated the record. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Depending on the exact use-case these might or might not >>>>>>>>>>>>>>>> have utility. I'm just wondering if there will be more >>>>>>>>>>>>>>>> example like this >>>>>>>>>>>>>>>> in the future. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> created_by column would incur likely significantly higher >>>>>>>>>>>>>>>>> storage overhead compared to the updated_by column. As rows >>>>>>>>>>>>>>>>> are updated >>>>>>>>>>>>>>>>> overtime, the cardinality for this column in data files can >>>>>>>>>>>>>>>>> be high. Hence, >>>>>>>>>>>>>>>>> the created_by column may not compress well. This is a >>>>>>>>>>>>>>>>> similar problem for >>>>>>>>>>>>>>>>> the row_id column. One side effect of enabling row lineage by >>>>>>>>>>>>>>>>> default for >>>>>>>>>>>>>>>>> V3 tables is the storage overhead of row_id column after >>>>>>>>>>>>>>>>> compaction >>>>>>>>>>>>>>>>> especially for narrow tables with few columns. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I agree. I think this analysis also shows that some >>>>>>>>>>>>>>>> consumers of Iceberg might not necessarily want to have all >>>>>>>>>>>>>>>> these columns, >>>>>>>>>>>>>>>> so we might want to make them configurable, rather than >>>>>>>>>>>>>>>> mandating them for >>>>>>>>>>>>>>>> all tables. Ryan's thought on default values seems like it >>>>>>>>>>>>>>>> would solve the >>>>>>>>>>>>>>>> issues I was raising. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>> Micah >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Thu, Dec 11, 2025 at 3:47 PM Ryan Blue <[email protected]> >>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> > An explicit timestamp column adds more burden to >>>>>>>>>>>>>>>>> application developers. While some databases require an >>>>>>>>>>>>>>>>> explicit column in >>>>>>>>>>>>>>>>> the schema, those databases provide triggers to auto set the >>>>>>>>>>>>>>>>> column value. >>>>>>>>>>>>>>>>> For Iceberg, the snapshot timestamp is the closest to the >>>>>>>>>>>>>>>>> trigger timestamp. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Since the use cases don't require an exact timestamp, this >>>>>>>>>>>>>>>>> seems like the best solution to get what people want (an >>>>>>>>>>>>>>>>> insertion >>>>>>>>>>>>>>>>> timestamp) that has clear and well-defined behavior. Since >>>>>>>>>>>>>>>>> `current_timestamp` is defined by the SQL spec, it makes >>>>>>>>>>>>>>>>> sense to me that >>>>>>>>>>>>>>>>> we could use it and have reasonable behavior. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> I've talked with Anton about this before and maybe he'll >>>>>>>>>>>>>>>>> jump in on this thread. I think that we may need to extend >>>>>>>>>>>>>>>>> default values >>>>>>>>>>>>>>>>> to include default value expressions, like >>>>>>>>>>>>>>>>> `current_timestamp` that is >>>>>>>>>>>>>>>>> allowed by the SQL spec. That would solve the problem as well >>>>>>>>>>>>>>>>> as some >>>>>>>>>>>>>>>>> others (like `current_date` or `current_user`) and would not >>>>>>>>>>>>>>>>> create a >>>>>>>>>>>>>>>>> potentially misleading (and heavyweight) timestamp feature in >>>>>>>>>>>>>>>>> the format. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> > Also some environments may have stronger clock service, >>>>>>>>>>>>>>>>> like Spanner TrueTime service. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Even in cases like this, commit retries can reorder >>>>>>>>>>>>>>>>> commits and make timestamps out of order. I don't think that >>>>>>>>>>>>>>>>> we should be >>>>>>>>>>>>>>>>> making guarantees or even exposing metadata that people might >>>>>>>>>>>>>>>>> mistake as >>>>>>>>>>>>>>>>> having those guarantees. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On Tue, Dec 9, 2025 at 2:22 PM Steven Wu < >>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Ryan, thanks a lot for the feedback! >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Regarding the concern for reliable timestamps, we are not >>>>>>>>>>>>>>>>>> proposing using timestamps for ordering. With NTP in modern >>>>>>>>>>>>>>>>>> computers, they >>>>>>>>>>>>>>>>>> are generally reliable enough for the intended use cases. >>>>>>>>>>>>>>>>>> Also some >>>>>>>>>>>>>>>>>> environments may have stronger clock service, like Spanner >>>>>>>>>>>>>>>>>> TrueTime service >>>>>>>>>>>>>>>>>> <https://docs.cloud.google.com/spanner/docs/true-time-external-consistency> >>>>>>>>>>>>>>>>>> . >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> > joining to timestamps from the snapshots metadata >>>>>>>>>>>>>>>>>> table. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> As you also mentioned, it depends on the snapshot >>>>>>>>>>>>>>>>>> history, which is often retained for a few days due to >>>>>>>>>>>>>>>>>> performance reasons. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> > embedding a timestamp in DML (like `current_timestamp`) >>>>>>>>>>>>>>>>>> rather than relying on an implicit one from table metadata. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> An explicit timestamp column adds more burden to >>>>>>>>>>>>>>>>>> application developers. While some databases require an >>>>>>>>>>>>>>>>>> explicit column in >>>>>>>>>>>>>>>>>> the schema, those databases provide triggers to auto set the >>>>>>>>>>>>>>>>>> column value. >>>>>>>>>>>>>>>>>> For Iceberg, the snapshot timestamp is the closest to the >>>>>>>>>>>>>>>>>> trigger timestamp. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Also, the timestamp set during computation (like >>>>>>>>>>>>>>>>>> streaming ingestion or relative long batch computation) >>>>>>>>>>>>>>>>>> doesn't capture the >>>>>>>>>>>>>>>>>> time the rows/files are added to the Iceberg table in a >>>>>>>>>>>>>>>>>> batch fashion. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> > And for those use cases, you could also keep a longer >>>>>>>>>>>>>>>>>> history of snapshot timestamps, like storing a catalog's >>>>>>>>>>>>>>>>>> event log for >>>>>>>>>>>>>>>>>> long-term access to timestamp info >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> this is not really consumable by joining the regular >>>>>>>>>>>>>>>>>> table query with catalog event log. I would also imagine >>>>>>>>>>>>>>>>>> catalog event log >>>>>>>>>>>>>>>>>> is capped at shorter retention (maybe a few months) compared >>>>>>>>>>>>>>>>>> to data >>>>>>>>>>>>>>>>>> retention (could be a few years). >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> On Tue, Dec 9, 2025 at 1:32 PM Ryan Blue < >>>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> I don't think it is a good idea to expose timestamps at >>>>>>>>>>>>>>>>>>> the row level. Timestamps in metadata that would be carried >>>>>>>>>>>>>>>>>>> down to the row >>>>>>>>>>>>>>>>>>> level already confuse people that expect them to be useful >>>>>>>>>>>>>>>>>>> or reliable, >>>>>>>>>>>>>>>>>>> rather than for debugging. I think extending this to the >>>>>>>>>>>>>>>>>>> row level would >>>>>>>>>>>>>>>>>>> only make the problem worse. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> You can already get this information by projecting the >>>>>>>>>>>>>>>>>>> last updated sequence number, which is reliable, and >>>>>>>>>>>>>>>>>>> joining to timestamps >>>>>>>>>>>>>>>>>>> from the snapshots metadata table. Of course, the drawback >>>>>>>>>>>>>>>>>>> there is losing >>>>>>>>>>>>>>>>>>> the timestamp information when snapshots expire, but since >>>>>>>>>>>>>>>>>>> it isn't >>>>>>>>>>>>>>>>>>> reliable anyway I'd be fine with that. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Some of the use cases, like auditing and compliance, are >>>>>>>>>>>>>>>>>>> probably better served by embedding a timestamp in DML (like >>>>>>>>>>>>>>>>>>> `current_timestamp`) rather than relying on an implicit one >>>>>>>>>>>>>>>>>>> from table >>>>>>>>>>>>>>>>>>> metadata. And for those use cases, you could also keep a >>>>>>>>>>>>>>>>>>> longer history of >>>>>>>>>>>>>>>>>>> snapshot timestamps, like storing a catalog's event log for >>>>>>>>>>>>>>>>>>> long-term >>>>>>>>>>>>>>>>>>> access to timestamp info. I think that would be better than >>>>>>>>>>>>>>>>>>> storing it at >>>>>>>>>>>>>>>>>>> the row level. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> On Mon, Dec 8, 2025 at 3:46 PM Steven Wu < >>>>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> For V4 spec, I have a small proposal [1] to expose the >>>>>>>>>>>>>>>>>>>> row timestamp concept that can help with many use cases >>>>>>>>>>>>>>>>>>>> like temporal >>>>>>>>>>>>>>>>>>>> queries, latency tracking, TTL, auditing and compliance. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> This *_last_updated_timestamp_ms * metadata column >>>>>>>>>>>>>>>>>>>> behaves very similarly to the >>>>>>>>>>>>>>>>>>>> *_last_updated_sequence_number* for row lineage. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> - Initially, it inherits from the snapshot >>>>>>>>>>>>>>>>>>>> timestamp. >>>>>>>>>>>>>>>>>>>> - During rewrite (like compaction), its values are >>>>>>>>>>>>>>>>>>>> persisted in the data files. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Would love to hear what you think. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>> Steven >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> [1] >>>>>>>>>>>>>>>>>>>> https://docs.google.com/document/d/1cXr_RwEO6o66S8vR7k3NM8-bJ9tH2rkh4vSdMXNC8J8/edit?usp=sharing >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>
