WilliamShine opened a new issue #4816:
URL: https://github.com/apache/hudi/issues/4816


   flink run this sql will get NPE:
   
   CREATE TABLE `account`(
     `_hoodie_commit_time` string, 
     `_hoodie_commit_seqno` string, 
     `_hoodie_record_key` string, 
     `_hoodie_partition_path` string, 
     `_hoodie_file_name` string, 
     `_ts_ms` bigint, 
     `_op` string, 
     `_hoodie_is_deleted` boolean, 
     `id` int, 
     `val` int, 
     `created_at` bigint, 
     `hh` string,
   `dt` string,
   PRIMARY KEY (`id`)  NOT ENFORCED)
   PARTITIONED BY (`dt`)
   WITH (
     'connector' = 'hudi',
     'path' = 's3://de-hive-test/ods_test_debezium_nick.db/test_ods_monitor1',
     'table.type' = 'MERGE_ON_READ'
   );
   CREATE TABLE if not exists `printTable` (
     `_hoodie_commit_time` string, 
     `_hoodie_commit_seqno` string, 
     `_hoodie_record_key` string, 
     `_hoodie_partition_path` string, 
     `_hoodie_file_name` string, 
     `_ts_ms` bigint, 
     `_op` string, 
     `_hoodie_is_deleted` boolean, 
     `id` int, 
     `val` int, 
     `created_at` bigint, 
     `hh` string,
   `dt` string
   ) WITH (
   'connector' = 'print'
   );
   INSERT INTO  printTable select * from account;
   
   
   why MergeOnReadInputFormat.getRequiredPosWithCommitTime  'add 
_hoodie_commit_time' for schema field?
   
   if sql have  'add _hoodie_commit_time' colum,schema will be 'add 
_hoodie_commit_time','add _hoodie_commit_time',`'hoodie_commit_seqno'......,
   the columnReaders[i].readToVector(num, writableVectors[i]) in 
ParquetColumnarRowSplitReader.nextBatch
   
   pageReader.readPage in AbstractColumnReader.readToVector will read  'add 
_hoodie_commit_time'  column twice,
   but in Parquet 1.11 ColumnChunkPageReadStore.readPage is DataPage 
compressedPage = compressedPages.poll();
   
   compressedPage will be null, final NEP will be happend.
   
   Can you tall me why add 'add _hoodie_commit_time' column in default?
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to