> Other common common audit fields that some people might want without keeping snapshot history: 1. insertion time 2. Created by. 3. Updated by.
Micah, are 1 and 2 the same? 3 is covered by this proposal. To support the created_by timestamp, we would need to implement the following row lineage behavior * Initially, it inherits from the snapshot timestamp * during rewrite (like compaction), it should be persisted into data files. * during update, it needs to be carried over from the previous row. This is similar to the row_id carry over for row updates. created_by column would incur likely significantly higher storage overhead compared to the updated_by column. As rows are updated overtime, the cardinality for this column in data files can be high. Hence, the created_by column may not compress well. This is a similar problem for the row_id column. One side effect of enabling row lineage by default for V3 tables is the storage overhead of row_id column after compaction especially for narrow tables with few columns. updated_by column has very few distinct values in a data file and compresses very well in columnar format. Its storage overhead is very minimal, which is the same for the last_updated_sequence_number column. The updated_by column is probably a little more useful row metadata for the mentioned use cases, which captures the time when a row is added to or updated in the table. It also has minimal computation and storage overhead. > This part of the discussion reminds of my proposal awhile back to keep expired snapshot metadata. > Is there also a concern about storage, by adding so many metadata fields to each row? It does seem that if we can calculate it by existing fields like last_updated_sequence_number and joining with catalog information, it seems better for the spec. Szehon, thanks a lot for bringing up the previous proposal. Yes, it can help retain the expired snapshots in a separate storage (like catalog). If an Iceberg table has streaming ingestion that commits every minute, 2 years of snapshots would have ~1M snapshots. So they would likely need to be persisted into blob storage, compacted, maintained. With separate catalog storage for the expired snapshots, how could users consume the information? Are we thinking about a new metadata table like $EXPIRED_SNAPSHOTS? A built-in metadata column (like _last_updated_sequence_number or last_updated_timestamp_ms) is very easy to consume. For the last_updated_timestamp_ms column in this proposal, the storage overhead is very minimal as described above. On Thu, Dec 11, 2025 at 10:13 AM Szehon Ho <[email protected]> wrote: > this is not really consumable by joining the regular table query with >> catalog event log. I would also imagine catalog event log is capped at >> shorter retention (maybe a few months) compared to data retention (could be >> a few years). > > > This part of the discussion reminds of my proposal awhile back to keep > expired snapshot metadata : > https://lists.apache.org/thread/l9m0mp44byx9kzzzmolxnrdqlzbympb8 > <https://lists.apache.org/thread/l9m0mp44byx9kzzzmolxnrdqlzbympb8> I > think it would have been nice to have this formally in Iceberg (a list of > expired snapshots) but the discussion concluded that its probably easier > for the catalog to do. Maybe we could formalize it in the catalog spec > that allows for optional list of expired snapshots? > > Is there also a concern about storage, by adding so many metadata fields > to each row? It does seem that if we can calculate it by existing fields > like last_updated_sequence_number and joining with catalog information, it > seems better for the spec. > > Thanks > Szehon > > On Thu, Dec 11, 2025 at 10:02 AM Micah Kornfield <[email protected]> > wrote: > >> An explicit timestamp column adds more burden to application developers. >>> While some databases require an explicit column in the schema, those >>> databases provide triggers to auto set the column value. For Iceberg, the >>> snapshot timestamp is the closest to the trigger timestamp. >> >> >> I wonder if we should be looking at maybe generalizing the audit column >> in Iceberg and letting this be configured at a table level. Other common >> common audit fields that some people might want without keeping snapshot >> history: >> >> 1. insertion time >> 2. Created by. >> 3. Updated by. >> >> >> >> >> On Tue, Dec 9, 2025 at 2:23 PM Steven Wu <[email protected]> wrote: >> >>> Ryan, thanks a lot for the feedback! >>> >>> Regarding the concern for reliable timestamps, we are not proposing >>> using timestamps for ordering. With NTP in modern computers, they are >>> generally reliable enough for the intended use cases. Also some >>> environments may have stronger clock service, like Spanner TrueTime >>> service >>> <https://docs.cloud.google.com/spanner/docs/true-time-external-consistency> >>> . >>> >>> > joining to timestamps from the snapshots metadata table. >>> >>> As you also mentioned, it depends on the snapshot history, which is >>> often retained for a few days due to performance reasons. >>> >>> > embedding a timestamp in DML (like `current_timestamp`) rather than >>> relying on an implicit one from table metadata. >>> >>> An explicit timestamp column adds more burden to application developers. >>> While some databases require an explicit column in the schema, those >>> databases provide triggers to auto set the column value. For Iceberg, the >>> snapshot timestamp is the closest to the trigger timestamp. >>> >>> Also, the timestamp set during computation (like streaming ingestion or >>> relative long batch computation) doesn't capture the time the rows/files >>> are added to the Iceberg table in a batch fashion. >>> >>> > And for those use cases, you could also keep a longer history of >>> snapshot timestamps, like storing a catalog's event log for long-term >>> access to timestamp info >>> >>> this is not really consumable by joining the regular table query with >>> catalog event log. I would also imagine catalog event log is capped at >>> shorter retention (maybe a few months) compared to data retention (could be >>> a few years). >>> >>> >>> >>> On Tue, Dec 9, 2025 at 1:32 PM Ryan Blue <[email protected]> wrote: >>> >>>> I don't think it is a good idea to expose timestamps at the row level. >>>> Timestamps in metadata that would be carried down to the row level already >>>> confuse people that expect them to be useful or reliable, rather than for >>>> debugging. I think extending this to the row level would only make the >>>> problem worse. >>>> >>>> You can already get this information by projecting the last updated >>>> sequence number, which is reliable, and joining to timestamps from the >>>> snapshots metadata table. Of course, the drawback there is losing the >>>> timestamp information when snapshots expire, but since it isn't reliable >>>> anyway I'd be fine with that. >>>> >>>> Some of the use cases, like auditing and compliance, are probably >>>> better served by embedding a timestamp in DML (like `current_timestamp`) >>>> rather than relying on an implicit one from table metadata. And for >>>> those use cases, you could also keep a longer history of snapshot >>>> timestamps, like storing a catalog's event log for long-term access to >>>> timestamp info. I think that would be better than storing it at the row >>>> level. >>>> >>>> On Mon, Dec 8, 2025 at 3:46 PM Steven Wu <[email protected]> wrote: >>>> >>>>> Hi, >>>>> >>>>> For V4 spec, I have a small proposal [1] to expose the row timestamp >>>>> concept that can help with many use cases like temporal queries, latency >>>>> tracking, TTL, auditing and compliance. >>>>> >>>>> This *_last_updated_timestamp_ms * metadata column behaves very >>>>> similarly to the *_last_updated_sequence_number* for row lineage. >>>>> >>>>> - Initially, it inherits from the snapshot timestamp. >>>>> - During rewrite (like compaction), its values are persisted in >>>>> the data files. >>>>> >>>>> Would love to hear what you think. >>>>> >>>>> Thanks, >>>>> Steven >>>>> >>>>> [1] >>>>> https://docs.google.com/document/d/1cXr_RwEO6o66S8vR7k3NM8-bJ9tH2rkh4vSdMXNC8J8/edit?usp=sharing >>>>> >>>>> >>>>>
