I think this is the right step forward. Our current "timestamp" definition is too ambiguous to be useful so establishing a well defined and monotonic timestamp could be really great. I also like the ability for row's to know this value without having to rely on snapshot information which can be expired.
On Mon, Jan 12, 2026 at 11:03 AM Steven Wu <[email protected]> wrote: > Hi all, > > I have revised the row timestamp proposal with the following changes. > * a new commit_timestamp field in snapshot metadata that has nano-second > precision. > * this optional field is only set by the REST catalog server > * it needs to be monotonic (e.g. implemented using Lamport timestamp) > > > https://docs.google.com/document/d/1cXr_RwEO6o66S8vR7k3NM8-bJ9tH2rkh4vSdMXNC8J8/edit?tab=t.0#heading=h.efdngoizchuh > > Thanks, > Steven > > > On Fri, Dec 12, 2025 at 2:36 PM Steven Wu <[email protected]> wrote: > >> Thanks for the clarification, Ryan. >> >> For long-running streaming jobs that commit periodically, it is difficult >> to establish the constant value of current_timestamp across all writer >> tasks for each commit cycle. I guess streaming writers may just need to >> write the wall clock time when appending a row to a data file for the >> default value of current_timestamp. >> >> >> >> On Fri, Dec 12, 2025 at 1:44 PM Ryan Blue <[email protected]> wrote: >> >>> I don't think that every row would have a different value. That would be >>> up to the engine, but I would expect engines to insert `CURRENT_TIMESTAMP` >>> into the plan and then replace it with a constant, resulting in a >>> consistent value for all rows. >>> >>> You're right that this would not necessarily be the commit time. But >>> neither is the commit timestamp from Iceberg's snapshot. I'm not sure how >>> we are going to define "good enough" for this purpose. I think at least >>> `CURRENT_TIMESTAMP` has reliable and known behavior when you look at how it >>> is handled in engines. And if you want the Iceberg timestamp, then use a >>> periodic query of the snapshot stable to keep track of them in a table you >>> can join to. I don't think this rises to the need for a table feature >>> unless we can guarantee that it is correct. >>> >>> On Fri, Dec 12, 2025 at 1:19 PM Steven Wu <[email protected]> wrote: >>> >>>> > Postgres `current_timestamp` captures the transaction start time [1, >>>> 2]. Should we extend the same semantic to Iceberg: all rows added in the >>>> same snapshot should have the same timestamp value? >>>> >>>> Let me clarify my last comment. >>>> >>>> created_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP) >>>> >>>> Since Postgres current_timestamp captures the transaction start time, >>>> all rows added in the same insert transaction would have the same value as >>>> the transaction timestamp with the column definition above. >>>> >>>> If we extend a similar semantic to Iceberg, all rows added in the same >>>> Iceberg transaction/snapshot should have the same timestamp? >>>> >>>> Ryan, I understand your comment for using current_timestamp expression >>>> as column default value, you were thinking that the engine would set the >>>> column value to the wall clock time when appending a row to a data file, >>>> right? every row would almost have a different timestamp value. >>>> >>>> On Fri, Dec 12, 2025 at 10:26 AM Steven Wu <[email protected]> >>>> wrote: >>>> >>>>> >>>>> `current_timestamp` expression may not always carry the right semantic >>>>> for the use cases. E.g., latency tracking is interested in when records >>>>> are >>>>> added / committed to the table, not when the record was appended to an >>>>> uncommitted data file in the processing engine. Record creation and >>>>> Iceberg >>>>> commit can be minutes or even hours apart. >>>>> >>>>> Row timestamp inherited from snapshot timestamp has no overhead with >>>>> the initial commit and has very minimal storage overhead during file >>>>> rewrite. Per-row current_timestamp would have distinct values for every >>>>> row >>>>> and has more storage overhead. >>>>> >>>>> OLTP databases deal with small row-level transactions. Postgres >>>>> `current_timestamp` captures the transaction start time [1, 2]. Should we >>>>> extend the same semantic to Iceberg: all rows added in the same snapshot >>>>> should have the same timestamp value? >>>>> >>>>> [1] https://www.postgresql.org/docs/current/functions-datetime.html >>>>> [2] >>>>> https://neon.com/postgresql/postgresql-date-functions/postgresql-current_timestamp >>>>> >>>>> >>>>> On Thu, Dec 11, 2025 at 4:07 PM Micah Kornfield <[email protected]> >>>>> wrote: >>>>> >>>>>> >>>>>>> Micah, are 1 and 2 the same? 3 is covered by this proposal. >>>>>>> To support the created_by timestamp, we would need to implement the >>>>>>> following row lineage behavior >>>>>>> * Initially, it inherits from the snapshot timestamp >>>>>>> * during rewrite (like compaction), it should be persisted into data >>>>>>> files. >>>>>>> * during update, it needs to be carried over from the previous row. >>>>>>> This is similar to the row_id carry over for row updates. >>>>>> >>>>>> >>>>>> Sorry for the short hand. These are not the same: >>>>>> >>>>>> 1. Insertion time - time the row was inserted. >>>>>> 2. Create by - The system that created the record. >>>>>> 3. Updated by - The system that last updated the record. >>>>>> >>>>>> Depending on the exact use-case these might or might not have >>>>>> utility. I'm just wondering if there will be more example like this in >>>>>> the >>>>>> future. >>>>>> >>>>>> >>>>>> created_by column would incur likely significantly higher storage >>>>>>> overhead compared to the updated_by column. As rows are updated >>>>>>> overtime, >>>>>>> the cardinality for this column in data files can be high. Hence, the >>>>>>> created_by column may not compress well. This is a similar problem for >>>>>>> the >>>>>>> row_id column. One side effect of enabling row lineage by default for V3 >>>>>>> tables is the storage overhead of row_id column after compaction >>>>>>> especially >>>>>>> for narrow tables with few columns. >>>>>> >>>>>> >>>>>> I agree. I think this analysis also shows that some consumers of >>>>>> Iceberg might not necessarily want to have all these columns, so we might >>>>>> want to make them configurable, rather than mandating them for all >>>>>> tables. >>>>>> Ryan's thought on default values seems like it would solve the issues I >>>>>> was >>>>>> raising. >>>>>> >>>>>> Thanks, >>>>>> Micah >>>>>> >>>>>> On Thu, Dec 11, 2025 at 3:47 PM Ryan Blue <[email protected]> wrote: >>>>>> >>>>>>> > An explicit timestamp column adds more burden to application >>>>>>> developers. While some databases require an explicit column in the >>>>>>> schema, >>>>>>> those databases provide triggers to auto set the column value. For >>>>>>> Iceberg, >>>>>>> the snapshot timestamp is the closest to the trigger timestamp. >>>>>>> >>>>>>> Since the use cases don't require an exact timestamp, this seems >>>>>>> like the best solution to get what people want (an insertion timestamp) >>>>>>> that has clear and well-defined behavior. Since `current_timestamp` is >>>>>>> defined by the SQL spec, it makes sense to me that we could use it and >>>>>>> have >>>>>>> reasonable behavior. >>>>>>> >>>>>>> I've talked with Anton about this before and maybe he'll jump in on >>>>>>> this thread. I think that we may need to extend default values to >>>>>>> include >>>>>>> default value expressions, like `current_timestamp` that is allowed by >>>>>>> the >>>>>>> SQL spec. That would solve the problem as well as some others (like >>>>>>> `current_date` or `current_user`) and would not create a potentially >>>>>>> misleading (and heavyweight) timestamp feature in the format. >>>>>>> >>>>>>> > Also some environments may have stronger clock service, like >>>>>>> Spanner TrueTime service. >>>>>>> >>>>>>> Even in cases like this, commit retries can reorder commits and make >>>>>>> timestamps out of order. I don't think that we should be making >>>>>>> guarantees >>>>>>> or even exposing metadata that people might mistake as having those >>>>>>> guarantees. >>>>>>> >>>>>>> On Tue, Dec 9, 2025 at 2:22 PM Steven Wu <[email protected]> >>>>>>> wrote: >>>>>>> >>>>>>>> Ryan, thanks a lot for the feedback! >>>>>>>> >>>>>>>> Regarding the concern for reliable timestamps, we are not proposing >>>>>>>> using timestamps for ordering. With NTP in modern computers, they are >>>>>>>> generally reliable enough for the intended use cases. Also some >>>>>>>> environments may have stronger clock service, like Spanner >>>>>>>> TrueTime service >>>>>>>> <https://docs.cloud.google.com/spanner/docs/true-time-external-consistency> >>>>>>>> . >>>>>>>> >>>>>>>> > joining to timestamps from the snapshots metadata table. >>>>>>>> >>>>>>>> As you also mentioned, it depends on the snapshot history, which is >>>>>>>> often retained for a few days due to performance reasons. >>>>>>>> >>>>>>>> > embedding a timestamp in DML (like `current_timestamp`) rather >>>>>>>> than relying on an implicit one from table metadata. >>>>>>>> >>>>>>>> An explicit timestamp column adds more burden to application >>>>>>>> developers. While some databases require an explicit column in the >>>>>>>> schema, >>>>>>>> those databases provide triggers to auto set the column value. For >>>>>>>> Iceberg, >>>>>>>> the snapshot timestamp is the closest to the trigger timestamp. >>>>>>>> >>>>>>>> Also, the timestamp set during computation (like streaming >>>>>>>> ingestion or relative long batch computation) doesn't capture the time >>>>>>>> the >>>>>>>> rows/files are added to the Iceberg table in a batch fashion. >>>>>>>> >>>>>>>> > And for those use cases, you could also keep a longer history of >>>>>>>> snapshot timestamps, like storing a catalog's event log for long-term >>>>>>>> access to timestamp info >>>>>>>> >>>>>>>> this is not really consumable by joining the regular table query >>>>>>>> with catalog event log. I would also imagine catalog event log is >>>>>>>> capped at >>>>>>>> shorter retention (maybe a few months) compared to data retention >>>>>>>> (could be >>>>>>>> a few years). >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Tue, Dec 9, 2025 at 1:32 PM Ryan Blue <[email protected]> wrote: >>>>>>>> >>>>>>>>> I don't think it is a good idea to expose timestamps at the row >>>>>>>>> level. Timestamps in metadata that would be carried down to the row >>>>>>>>> level >>>>>>>>> already confuse people that expect them to be useful or reliable, >>>>>>>>> rather >>>>>>>>> than for debugging. I think extending this to the row level would >>>>>>>>> only make >>>>>>>>> the problem worse. >>>>>>>>> >>>>>>>>> You can already get this information by projecting the last >>>>>>>>> updated sequence number, which is reliable, and joining to timestamps >>>>>>>>> from >>>>>>>>> the snapshots metadata table. Of course, the drawback there is losing >>>>>>>>> the >>>>>>>>> timestamp information when snapshots expire, but since it isn't >>>>>>>>> reliable >>>>>>>>> anyway I'd be fine with that. >>>>>>>>> >>>>>>>>> Some of the use cases, like auditing and compliance, are probably >>>>>>>>> better served by embedding a timestamp in DML (like >>>>>>>>> `current_timestamp`) >>>>>>>>> rather than relying on an implicit one from table metadata. And for >>>>>>>>> those use cases, you could also keep a longer history of snapshot >>>>>>>>> timestamps, like storing a catalog's event log for long-term access to >>>>>>>>> timestamp info. I think that would be better than storing it at the >>>>>>>>> row >>>>>>>>> level. >>>>>>>>> >>>>>>>>> On Mon, Dec 8, 2025 at 3:46 PM Steven Wu <[email protected]> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Hi, >>>>>>>>>> >>>>>>>>>> For V4 spec, I have a small proposal [1] to expose the row >>>>>>>>>> timestamp concept that can help with many use cases like temporal >>>>>>>>>> queries, >>>>>>>>>> latency tracking, TTL, auditing and compliance. >>>>>>>>>> >>>>>>>>>> This *_last_updated_timestamp_ms * metadata column behaves very >>>>>>>>>> similarly to the *_last_updated_sequence_number* for row lineage. >>>>>>>>>> >>>>>>>>>> - Initially, it inherits from the snapshot timestamp. >>>>>>>>>> - During rewrite (like compaction), its values are persisted >>>>>>>>>> in the data files. >>>>>>>>>> >>>>>>>>>> Would love to hear what you think. >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> Steven >>>>>>>>>> >>>>>>>>>> [1] >>>>>>>>>> https://docs.google.com/document/d/1cXr_RwEO6o66S8vR7k3NM8-bJ9tH2rkh4vSdMXNC8J8/edit?usp=sharing >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>
