[ 
https://issues.apache.org/jira/browse/HUDI-6410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aditya Goenka closed HUDI-6410.
-------------------------------
    Resolution: Fixed

It was not a issue but due to typo.

> MERGE INTO giving duplicate rows even if table have precombineKey
> -----------------------------------------------------------------
>
>                 Key: HUDI-6410
>                 URL: https://issues.apache.org/jira/browse/HUDI-6410
>             Project: Apache Hudi
>          Issue Type: Bug
>            Reporter: Aditya Goenka
>            Priority: Blocker
>             Fix For: 0.14.0
>
>         Attachments: image-2023-06-19-15-16-58-055.png, 
> image-2023-06-19-15-37-27-202.png
>
>
> Merge into is giving duplicate rows even if precombine key is there. 
>  
> Example -
> spark-sql> create table spark_mor_no_pre_t5 (
>          >     id int,
>          > name string,
>          > updated_at timestamp
>          > ) using hudi
>          > options (
>          >     type = 'mor',
>          >     primaryKey = 'id',
>          >     precombineKey = 'updated_at'
>          > ) location 'file:///tmp/output/spark_mor_no_pre_t4';
> Time taken: 0.363 seconds
> spark-sql> 
>          > merge into spark_mor_no_pre_t5 as target
>          > using (
>          >     select 1 as id, 'c' as name, current_timestamp as updated_at
>          > union select 1 as id,'d' as name, current_timestamp as updated_at
>          > union select 1 as id,'e' as name, current_timestamp as updated_at
>          > ) source
>          > on target.id = source.id
>          > when matched then update set *
>          > when not matched then insert *;
> Time taken: 3.111 seconds
> spark-sql> select * from spark_mor_no_pre_t5;
> 20230619151501003 20230619151501003_0_0 1 
> 4405350d-edd6-465b-ac43-8a68d26f957e-0_0-245-274_20230619151501003.parquet 1 
> e 2023-06-19 15:15:01.032766
> 20230619151501003 20230619151501003_0_1 1 
> 4405350d-edd6-465b-ac43-8a68d26f957e-0_0-245-274_20230619151501003.parquet 1 
> e 2023-06-19 15:15:01.032766
> 20230619151501003 20230619151501003_0_2 1 
> 4405350d-edd6-465b-ac43-8a68d26f957e-0_0-245-274_20230619151501003.parquet 1 
> e 2023-06-19 15:15:01.032766
> Time taken: 0.288 seconds, Fetched 3 row(s)
>  
> Github Issue - [https://github.com/apache/hudi/issues/8916]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to