[GitHub] [hudi] duanyongvictory opened a new issue, #6967: [SUPPORT]prestodb read hudi incorretly after ddl by spark

GitBox Sun, 16 Oct 2022 22:29:34 -0700


duanyongvictory opened a new issue, #6967:
URL: https://github.com/apache/hudi/issues/6967


   spark 3.2.2
   hudi 0.11.1
   prestodb 0.274
   hive 2.3.5
   
   the following step could recreate the bad read:
   
   these are done by spark:
   set hoodie.schema.on.read.enable=true;
   create table p1011 (
   id string,
   f1 string,
   f2 string,
   ts bigint
   ) using hudi
   tblproperties (
   type = 'mor',
   primaryKey = 'id',
   preCombineField = 'ts',
   'parquet.column.index.access'='true'
   );
   
   insert into p1011 select '01','f1_v','f2_v',1657608295538000;
   insert into p1011 select '02','f1_v','f2_v',1657628170556000;
   ALTER TABLE p1011 ADD COLUMNS(f3 string);
   insert into p1011 select '03','f1_v','f2_v',1657628170776000, 'f3_v';
   ALTER TABLE p1011 RENAME COLUMN f3 TO f3_new;
   
   when i spark-sql return the right result:
   > select id, f2, ts, f3_new from p1011;
   01 f2_v 1657608295538000 NULL
   02 f2_v 1657628170556000 NULL
   03 f2_v 1657628170776000 f3_v
   
   but when i prestodb, it returns:
   Query 20221017_013025_00001_wqfhy failed: org.apache.hadoop.io.Text cannot 
be cast to org.apache.hadoop.io.LongWritable
   
   when i change the table property by spark:
   alter table p1011 set tblproperties ('parquet.column.index.access'='false');
   
   then i prestodb, it returns the incorrect result:
   id | f2 | ts | f3_new
   ----+------+------------------+--------
   01 | f2_v | 1657608295538000 | NULL
   02 | f2_v | 1657628170556000 | NULL
   03 | f2_v | 1657628170776000 | NULL
   
   id=03, the [f3_new] field value should be [f3_v].
   
   could any one help me? thanks a lot.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] duanyongvictory opened a new issue, #6967: [SUPPORT]prestodb read hudi incorretly after ddl by spark

Reply via email to