Thanks for the clarification, Ryan. For long-running streaming jobs that commit periodically, it is difficult to establish the constant value of current_timestamp across all writer tasks for each commit cycle. I guess streaming writers may just need to write the wall clock time when appending a row to a data file for the default value of current_timestamp.
On Fri, Dec 12, 2025 at 1:44 PM Ryan Blue <[email protected]> wrote: > I don't think that every row would have a different value. That would be > up to the engine, but I would expect engines to insert `CURRENT_TIMESTAMP` > into the plan and then replace it with a constant, resulting in a > consistent value for all rows. > > You're right that this would not necessarily be the commit time. But > neither is the commit timestamp from Iceberg's snapshot. I'm not sure how > we are going to define "good enough" for this purpose. I think at least > `CURRENT_TIMESTAMP` has reliable and known behavior when you look at how it > is handled in engines. And if you want the Iceberg timestamp, then use a > periodic query of the snapshot stable to keep track of them in a table you > can join to. I don't think this rises to the need for a table feature > unless we can guarantee that it is correct. > > On Fri, Dec 12, 2025 at 1:19 PM Steven Wu <[email protected]> wrote: > >> > Postgres `current_timestamp` captures the transaction start time [1, >> 2]. Should we extend the same semantic to Iceberg: all rows added in the >> same snapshot should have the same timestamp value? >> >> Let me clarify my last comment. >> >> created_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP) >> >> Since Postgres current_timestamp captures the transaction start time, all >> rows added in the same insert transaction would have the same value as the >> transaction timestamp with the column definition above. >> >> If we extend a similar semantic to Iceberg, all rows added in the same >> Iceberg transaction/snapshot should have the same timestamp? >> >> Ryan, I understand your comment for using current_timestamp expression as >> column default value, you were thinking that the engine would set the >> column value to the wall clock time when appending a row to a data file, >> right? every row would almost have a different timestamp value. >> >> On Fri, Dec 12, 2025 at 10:26 AM Steven Wu <[email protected]> wrote: >> >>> >>> `current_timestamp` expression may not always carry the right semantic >>> for the use cases. E.g., latency tracking is interested in when records are >>> added / committed to the table, not when the record was appended to an >>> uncommitted data file in the processing engine. Record creation and Iceberg >>> commit can be minutes or even hours apart. >>> >>> Row timestamp inherited from snapshot timestamp has no overhead with the >>> initial commit and has very minimal storage overhead during file rewrite. >>> Per-row current_timestamp would have distinct values for every row and has >>> more storage overhead. >>> >>> OLTP databases deal with small row-level transactions. Postgres >>> `current_timestamp` captures the transaction start time [1, 2]. Should we >>> extend the same semantic to Iceberg: all rows added in the same snapshot >>> should have the same timestamp value? >>> >>> [1] https://www.postgresql.org/docs/current/functions-datetime.html >>> [2] >>> https://neon.com/postgresql/postgresql-date-functions/postgresql-current_timestamp >>> >>> >>> On Thu, Dec 11, 2025 at 4:07 PM Micah Kornfield <[email protected]> >>> wrote: >>> >>>> >>>>> Micah, are 1 and 2 the same? 3 is covered by this proposal. >>>>> To support the created_by timestamp, we would need to implement the >>>>> following row lineage behavior >>>>> * Initially, it inherits from the snapshot timestamp >>>>> * during rewrite (like compaction), it should be persisted into data >>>>> files. >>>>> * during update, it needs to be carried over from the previous row. >>>>> This is similar to the row_id carry over for row updates. >>>> >>>> >>>> Sorry for the short hand. These are not the same: >>>> >>>> 1. Insertion time - time the row was inserted. >>>> 2. Create by - The system that created the record. >>>> 3. Updated by - The system that last updated the record. >>>> >>>> Depending on the exact use-case these might or might not have utility. >>>> I'm just wondering if there will be more example like this in the future. >>>> >>>> >>>> created_by column would incur likely significantly higher storage >>>>> overhead compared to the updated_by column. As rows are updated overtime, >>>>> the cardinality for this column in data files can be high. Hence, the >>>>> created_by column may not compress well. This is a similar problem for the >>>>> row_id column. One side effect of enabling row lineage by default for V3 >>>>> tables is the storage overhead of row_id column after compaction >>>>> especially >>>>> for narrow tables with few columns. >>>> >>>> >>>> I agree. I think this analysis also shows that some consumers of >>>> Iceberg might not necessarily want to have all these columns, so we might >>>> want to make them configurable, rather than mandating them for all tables. >>>> Ryan's thought on default values seems like it would solve the issues I was >>>> raising. >>>> >>>> Thanks, >>>> Micah >>>> >>>> On Thu, Dec 11, 2025 at 3:47 PM Ryan Blue <[email protected]> wrote: >>>> >>>>> > An explicit timestamp column adds more burden to application >>>>> developers. While some databases require an explicit column in the schema, >>>>> those databases provide triggers to auto set the column value. For >>>>> Iceberg, >>>>> the snapshot timestamp is the closest to the trigger timestamp. >>>>> >>>>> Since the use cases don't require an exact timestamp, this seems like >>>>> the best solution to get what people want (an insertion timestamp) that >>>>> has >>>>> clear and well-defined behavior. Since `current_timestamp` is defined by >>>>> the SQL spec, it makes sense to me that we could use it and have >>>>> reasonable >>>>> behavior. >>>>> >>>>> I've talked with Anton about this before and maybe he'll jump in on >>>>> this thread. I think that we may need to extend default values to include >>>>> default value expressions, like `current_timestamp` that is allowed by the >>>>> SQL spec. That would solve the problem as well as some others (like >>>>> `current_date` or `current_user`) and would not create a potentially >>>>> misleading (and heavyweight) timestamp feature in the format. >>>>> >>>>> > Also some environments may have stronger clock service, like Spanner >>>>> TrueTime service. >>>>> >>>>> Even in cases like this, commit retries can reorder commits and make >>>>> timestamps out of order. I don't think that we should be making guarantees >>>>> or even exposing metadata that people might mistake as having those >>>>> guarantees. >>>>> >>>>> On Tue, Dec 9, 2025 at 2:22 PM Steven Wu <[email protected]> wrote: >>>>> >>>>>> Ryan, thanks a lot for the feedback! >>>>>> >>>>>> Regarding the concern for reliable timestamps, we are not proposing >>>>>> using timestamps for ordering. With NTP in modern computers, they are >>>>>> generally reliable enough for the intended use cases. Also some >>>>>> environments may have stronger clock service, like Spanner TrueTime >>>>>> service >>>>>> <https://docs.cloud.google.com/spanner/docs/true-time-external-consistency> >>>>>> . >>>>>> >>>>>> > joining to timestamps from the snapshots metadata table. >>>>>> >>>>>> As you also mentioned, it depends on the snapshot history, which is >>>>>> often retained for a few days due to performance reasons. >>>>>> >>>>>> > embedding a timestamp in DML (like `current_timestamp`) rather than >>>>>> relying on an implicit one from table metadata. >>>>>> >>>>>> An explicit timestamp column adds more burden to application >>>>>> developers. While some databases require an explicit column in the >>>>>> schema, >>>>>> those databases provide triggers to auto set the column value. For >>>>>> Iceberg, >>>>>> the snapshot timestamp is the closest to the trigger timestamp. >>>>>> >>>>>> Also, the timestamp set during computation (like streaming ingestion >>>>>> or relative long batch computation) doesn't capture the time the >>>>>> rows/files >>>>>> are added to the Iceberg table in a batch fashion. >>>>>> >>>>>> > And for those use cases, you could also keep a longer history of >>>>>> snapshot timestamps, like storing a catalog's event log for long-term >>>>>> access to timestamp info >>>>>> >>>>>> this is not really consumable by joining the regular table query with >>>>>> catalog event log. I would also imagine catalog event log is capped at >>>>>> shorter retention (maybe a few months) compared to data retention (could >>>>>> be >>>>>> a few years). >>>>>> >>>>>> >>>>>> >>>>>> On Tue, Dec 9, 2025 at 1:32 PM Ryan Blue <[email protected]> wrote: >>>>>> >>>>>>> I don't think it is a good idea to expose timestamps at the row >>>>>>> level. Timestamps in metadata that would be carried down to the row >>>>>>> level >>>>>>> already confuse people that expect them to be useful or reliable, rather >>>>>>> than for debugging. I think extending this to the row level would only >>>>>>> make >>>>>>> the problem worse. >>>>>>> >>>>>>> You can already get this information by projecting the last updated >>>>>>> sequence number, which is reliable, and joining to timestamps from the >>>>>>> snapshots metadata table. Of course, the drawback there is losing the >>>>>>> timestamp information when snapshots expire, but since it isn't reliable >>>>>>> anyway I'd be fine with that. >>>>>>> >>>>>>> Some of the use cases, like auditing and compliance, are probably >>>>>>> better served by embedding a timestamp in DML (like `current_timestamp`) >>>>>>> rather than relying on an implicit one from table metadata. And for >>>>>>> those use cases, you could also keep a longer history of snapshot >>>>>>> timestamps, like storing a catalog's event log for long-term access to >>>>>>> timestamp info. I think that would be better than storing it at the row >>>>>>> level. >>>>>>> >>>>>>> On Mon, Dec 8, 2025 at 3:46 PM Steven Wu <[email protected]> >>>>>>> wrote: >>>>>>> >>>>>>>> Hi, >>>>>>>> >>>>>>>> For V4 spec, I have a small proposal [1] to expose the row >>>>>>>> timestamp concept that can help with many use cases like temporal >>>>>>>> queries, >>>>>>>> latency tracking, TTL, auditing and compliance. >>>>>>>> >>>>>>>> This *_last_updated_timestamp_ms * metadata column behaves very >>>>>>>> similarly to the *_last_updated_sequence_number* for row lineage. >>>>>>>> >>>>>>>> - Initially, it inherits from the snapshot timestamp. >>>>>>>> - During rewrite (like compaction), its values are persisted in >>>>>>>> the data files. >>>>>>>> >>>>>>>> Would love to hear what you think. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Steven >>>>>>>> >>>>>>>> [1] >>>>>>>> https://docs.google.com/document/d/1cXr_RwEO6o66S8vR7k3NM8-bJ9tH2rkh4vSdMXNC8J8/edit?usp=sharing >>>>>>>> >>>>>>>> >>>>>>>>
