I've also been a bit confused by this edge case. Is  {a: 1, b: NULL} the
correct result for querying at "time" v3?

If so I agree it's a bit finicky for engines to produce that result. It
can't be determined from the snapshot-log alone, IIUC they'll need to look
at the metadata-log to find the active schema at that time.

On Mon, May 12, 2025 at 3:42 PM Ryan Blue <rdb...@gmail.com> wrote:

> Snapshots are created when data changes and there is no change to the data
> tree at “time” v3. If you want to create new snapshots when the schema
> changes it is alright to do it, but I don’t think that we need to require
> it in the spec.
>
> Also, it isn’t clear to me why the time travel query would resolve * as a
> and b when time traveling. That error shows that there is an inconsistent
> schema between time(v3) and the current schema (where b comes from). What
> happens when you run `SELECT * FROM t FOR TIMESTAMP time(v2)`?
>
> Ryan
>
> On Fri, May 9, 2025 at 4:03 AM Vladimir Ozerov <voze...@querifylabs.com>
> wrote:
>
>> Hi,
>>
>> When working on a time-travel in Trino I noticed the behavior which seems
>> counter-intuitive from the user perspective. Consider the following
>> sequence of SQL queries executed from Trino (could be any other query
>> engine). Here we create a table, populate it with some data, and then add
>> another column. The third operation is implemented in Trino via
>> Table.updateSchema().addColumn(...).commit().
>>
>> CREATE TABLE t (a);        // v1
>> INSERT INTO t VALUES (1);  // v2
>> ALTER TABLE t ADD COLUMN b // v3
>> UPDATE t SET b = 2;        // v4
>>
>> The trick is that the third operation creates a schema, but *doesn't*
>> create a snapshot. After that we endup with the following table metadata:
>> {
>>     snapshots: {
>>         v1: { schemaId: 0 } // Initial create
>>         v2: { schemaId: 0 } // Insert using the old schema
>>         v4: { schemaId: 1 } // Update using the new schema
>>     }
>>     schemaIds: { 0, 1 }
>>     currentSnapshot: v4
>>     currentSchema: 1
>> }
>>
>> What happens next in Trino:
>>
>> SELECT * FROM t;                  // Executed on { snapshot=v4,
>> schemaId=1 }, returns { a=1, b=2 } as expected
>> SELECT * FROM t FOR TIMESTAMP v3; // Executed on { snapshot=v2,
>> schemaId=0 }, throws exception: column "b" is not found
>>
>> That is, we cannot query the table after the new column is added, but
>> before it is updated. This happens because there is no dedicated snapshot
>> for UpdateSchema action. To make things even more confusing, if we do not
>> have the operation v4 then we will be able to query the v3 state, because
>> engines will basically "reconstruct" it as a virtual snapshot using
>> TableMetadata.currentSnapshot and TableMetadata.currentSchema:
>>
>> SELECT * FROM t;                  // Executed on { snapshot=v2,
>> schemaId=1 }, returns { a=1, b=null } as expected
>> SELECT * FROM t FOR TIMESTAMP v3; // Executed on { snapshot=v2,
>> schemaId=0 }, throws exception: column "b" is not found
>>
>> My question is: shouldn't we always implicitly add new snapshots for all
>> schema changes? Because otherwise we cannot read all existing table states
>> which is confusing. Of course, this could be handled on the engine side,
>> but this will lead to inconsistent behaviour between different engines
>> depending on their internal implementation details.
>>
>> Regards,
>> --
>> *Vladimir Ozerov*
>>
>

Reply via email to