[jira] [Created] (HUDI-6410) MERGE INTO giving duplicate rows even if table have precombineKey

Aditya Goenka (Jira) Mon, 19 Jun 2023 02:49:03 -0700

Aditya Goenka created HUDI-6410:
-----------------------------------

             Summary: MERGE INTO giving duplicate rows even if table have 
precombineKey
                 Key: HUDI-6410
                 URL: https://issues.apache.org/jira/browse/HUDI-6410
             Project: Apache Hudi
          Issue Type: Bug
            Reporter: Aditya Goenka
         Attachments: image-2023-06-19-15-16-58-055.png


Merge into is giving duplicate rows even if precombine key is there. 

 

Example -

spark-sql> create table spark_mor_no_pre_t5 (

         >     id int,

         > name string,

         > updated_at timestamp

         > ) using hudi

         > options (

         >     type = 'mor',

         >     primaryKey = 'id',

         >     precombineKey = 'updated_at'

         > ) location 'file:///tmp/output/spark_mor_no_pre_t4';

Time taken: 0.363 seconds

spark-sql> 

         > merge into spark_mor_no_pre_t5 as target

         > using (

         >     select 1 as id, 'c' as name, current_timestamp as updated_at

         > union select 1 as id,'d' as name, current_timestamp as updated_at

         > union select 1 as id,'e' as name, current_timestamp as updated_at

         > ) source

         > on target.id = source.id

         > when matched then update set *

         > when not matched then insert *;

Time taken: 3.111 seconds

spark-sql> select * from spark_mor_no_pre_t5;

20230619151501003 20230619151501003_0_0 1 
4405350d-edd6-465b-ac43-8a68d26f957e-0_0-245-274_20230619151501003.parquet 1 e 
2023-06-19 15:15:01.032766

20230619151501003 20230619151501003_0_1 1 
4405350d-edd6-465b-ac43-8a68d26f957e-0_0-245-274_20230619151501003.parquet 1 e 
2023-06-19 15:15:01.032766

20230619151501003 20230619151501003_0_2 1 
4405350d-edd6-465b-ac43-8a68d26f957e-0_0-245-274_20230619151501003.parquet 1 e 
2023-06-19 15:15:01.032766

Time taken: 0.288 seconds, Fetched 3 row(s)

 

Github Issue - [https://github.com/apache/hudi/issues/8916]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (HUDI-6410) MERGE INTO giving duplicate rows even if table have precombineKey

Reply via email to