kylincode opened a new issue, #6423:
URL: https://github.com/apache/hudi/issues/6423
After schema evaluation, when time travel queries the historical data, the
results show the latest schema instead of the historical schema
Steps to reproduce the behavior:
1. spark sql create table t1, sql:
`create table t1 (
id int,
name string,
price double,
ts long
) using hudi
location '/tmp/t1'
options (
type = 'mor',
primaryKey = 'id',
preCombineField = 'ts'
);`
2. insert data
`insert into t1 values(1,'Tom',0.9,1000);`
3. drop price column
`alter table t1 drop column price;`
4. Time Travel Query
`select * from t1 timestamp as of '20220817161104255'`
It is found that when time travel queries historical data, the results show
the latest schema (only include id、name、ts columns) instead of the historical
schema.
**Expected behavior**
when time travel queries historical data, the results show the schema of
historical time points
**Environment Description**
* Hudi version : 0.11.0
* Spark version : 3.2.2
* Storage (HDFS/S3/GCS..) : local mac os
* Running on Docker? (yes/no) : no
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]