I've also been a bit confused by this edge case. Is {a: 1, b: NULL} the correct result for querying at "time" v3?
If so I agree it's a bit finicky for engines to produce that result. It can't be determined from the snapshot-log alone, IIUC they'll need to look at the metadata-log to find the active schema at that time. On Mon, May 12, 2025 at 3:42 PM Ryan Blue <rdb...@gmail.com> wrote: > Snapshots are created when data changes and there is no change to the data > tree at “time” v3. If you want to create new snapshots when the schema > changes it is alright to do it, but I don’t think that we need to require > it in the spec. > > Also, it isn’t clear to me why the time travel query would resolve * as a > and b when time traveling. That error shows that there is an inconsistent > schema between time(v3) and the current schema (where b comes from). What > happens when you run `SELECT * FROM t FOR TIMESTAMP time(v2)`? > > Ryan > > On Fri, May 9, 2025 at 4:03 AM Vladimir Ozerov <voze...@querifylabs.com> > wrote: > >> Hi, >> >> When working on a time-travel in Trino I noticed the behavior which seems >> counter-intuitive from the user perspective. Consider the following >> sequence of SQL queries executed from Trino (could be any other query >> engine). Here we create a table, populate it with some data, and then add >> another column. The third operation is implemented in Trino via >> Table.updateSchema().addColumn(...).commit(). >> >> CREATE TABLE t (a); // v1 >> INSERT INTO t VALUES (1); // v2 >> ALTER TABLE t ADD COLUMN b // v3 >> UPDATE t SET b = 2; // v4 >> >> The trick is that the third operation creates a schema, but *doesn't* >> create a snapshot. After that we endup with the following table metadata: >> { >> snapshots: { >> v1: { schemaId: 0 } // Initial create >> v2: { schemaId: 0 } // Insert using the old schema >> v4: { schemaId: 1 } // Update using the new schema >> } >> schemaIds: { 0, 1 } >> currentSnapshot: v4 >> currentSchema: 1 >> } >> >> What happens next in Trino: >> >> SELECT * FROM t; // Executed on { snapshot=v4, >> schemaId=1 }, returns { a=1, b=2 } as expected >> SELECT * FROM t FOR TIMESTAMP v3; // Executed on { snapshot=v2, >> schemaId=0 }, throws exception: column "b" is not found >> >> That is, we cannot query the table after the new column is added, but >> before it is updated. This happens because there is no dedicated snapshot >> for UpdateSchema action. To make things even more confusing, if we do not >> have the operation v4 then we will be able to query the v3 state, because >> engines will basically "reconstruct" it as a virtual snapshot using >> TableMetadata.currentSnapshot and TableMetadata.currentSchema: >> >> SELECT * FROM t; // Executed on { snapshot=v2, >> schemaId=1 }, returns { a=1, b=null } as expected >> SELECT * FROM t FOR TIMESTAMP v3; // Executed on { snapshot=v2, >> schemaId=0 }, throws exception: column "b" is not found >> >> My question is: shouldn't we always implicitly add new snapshots for all >> schema changes? Because otherwise we cannot read all existing table states >> which is confusing. Of course, this could be handled on the engine side, >> but this will lead to inconsistent behaviour between different engines >> depending on their internal implementation details. >> >> Regards, >> -- >> *Vladimir Ozerov* >> >