wuwenchi created HUDI-4527:
------------------------------

             Summary: wrong data after compaction in MOR table by flink
                 Key: HUDI-4527
                 URL: https://issues.apache.org/jira/browse/HUDI-4527
             Project: Apache Hudi
          Issue Type: Bug
          Components: flink, flink-sql
            Reporter: wuwenchi


A MOR table with primarykey and preCombine field, after compaction, it will 
generate a parquet file. Insert one record, a log file will be generated.

If the preCombine in the log is smaller than the parquet, the record in the 
parquet should eventually be returned, but now the data in the log is returned 
incorrectly. If it is a table in COW mode, there is no such problem.

Such as:

create table t1(
  uuid int,
  ts int,
  PRIMARY KEY(uuid) NOT ENFORCED
) with (
  'connector' = 'hudi',
  'read.data.skipping.enabled' = 'true',
  'write.precombine' = 'true',
  'hoodie.datasource.write.recordkey.field' = 'uuid',
  'path' = '........',
  'table.type' = 'MERGE_ON_READ',
  'compaction.delta_commits' = '2',
  'hoodie.compact.inline' = 'true'
);

insert into t1 values(1, 1);                     ---- deltacommit

insert into t1 values(1, 100);                ---- deltacommit --> compaction 
-->  parquent

insert into t1 values(1, 2);                    ---- deltacommit

select * from t1;

It will return [+I[1, 2]], but in COPY_ON_WRITE mode, it will return [+I[1, 
100]].

 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to