duanyongvictory opened a new issue, #6967:
URL: https://github.com/apache/hudi/issues/6967
spark 3.2.2
hudi 0.11.1
prestodb 0.274
hive 2.3.5
the following step could recreate the bad read:
these are done by spark:
set hoodie.schema.on.read.enable=true;
create table p1011 (
id string,
f1 string,
f2 string,
ts bigint
) using hudi
tblproperties (
type = 'mor',
primaryKey = 'id',
preCombineField = 'ts',
'parquet.column.index.access'='true'
);
insert into p1011 select '01','f1_v','f2_v',1657608295538000;
insert into p1011 select '02','f1_v','f2_v',1657628170556000;
ALTER TABLE p1011 ADD COLUMNS(f3 string);
insert into p1011 select '03','f1_v','f2_v',1657628170776000, 'f3_v';
ALTER TABLE p1011 RENAME COLUMN f3 TO f3_new;
when i spark-sql return the right result:
> select id, f2, ts, f3_new from p1011;
01 f2_v 1657608295538000 NULL
02 f2_v 1657628170556000 NULL
03 f2_v 1657628170776000 f3_v
but when i prestodb, it returns:
Query 20221017_013025_00001_wqfhy failed: org.apache.hadoop.io.Text cannot
be cast to org.apache.hadoop.io.LongWritable
when i change the table property by spark:
alter table p1011 set tblproperties ('parquet.column.index.access'='false');
then i prestodb, it returns the incorrect result:
id | f2 | ts | f3_new
----+------+------------------+--------
01 | f2_v | 1657608295538000 | NULL
02 | f2_v | 1657628170556000 | NULL
03 | f2_v | 1657628170776000 | NULL
id=03, the [f3_new] field value should be [f3_v].
could any one help me? thanks a lot.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]