[ 
https://issues.apache.org/jira/browse/HUDI-6410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17734099#comment-17734099
 ] 

Aditya Goenka commented on HUDI-6410:
-------------------------------------

There was a problem in analysis . there is typo in precombineKey. It should be 
preCombineField. It is working as expected. Closing this JIRA.

spark-sql> 

         > 

         > create table spark_mor_no_pre_t8 (

         >     id int,

         > name string,

         > updated_at timestamp

         > ) using hudi

         > options (

         >     type = 'mor',

         >     primaryKey = 'id',

         >     preCombineField = 'updated_at'

         > ) location 'file:///tmp/output/spark_mor_no_pre_t8';

Time taken: 0.271 seconds

spark-sql> 

         > merge into spark_mor_no_pre_t8 as target

         > using (

         >     select 1 as id, 'c' as name, current_timestamp as updated_at

         > union select 1 as id,'d' as name, current_timestamp as updated_at

         > union select 1 as id,'e' as name, current_timestamp as updated_at

         > ) source

         > on target.id = source.id

         > when matched then update set *

         > when not matched then insert *;

23/06/19 15:32:15 WARN HoodieBackedTableMetadata: Metadata table was not found 
at path file:/tmp/output/spark_mor_no_pre_t8/.hoodie/metadata

Time taken: 3.903 seconds

spark-sql> 

         > select * from spark_mor_no_pre_t8;

20230619153215056 20230619153215056_0_0 1 
06d12bb0-6bf9-4389-8fee-96fabc2a8c14-0_0-81-78_20230619153215056.parquet 1 e 
2023-06-19 15:32:15.151468

Time taken: 0.36 seconds, Fetched 1 row(s)

> MERGE INTO giving duplicate rows even if table have precombineKey
> -----------------------------------------------------------------
>
>                 Key: HUDI-6410
>                 URL: https://issues.apache.org/jira/browse/HUDI-6410
>             Project: Apache Hudi
>          Issue Type: Bug
>            Reporter: Aditya Goenka
>            Priority: Blocker
>             Fix For: 0.14.0
>
>         Attachments: image-2023-06-19-15-16-58-055.png, 
> image-2023-06-19-15-37-27-202.png
>
>
> Merge into is giving duplicate rows even if precombine key is there. 
>  
> Example -
> spark-sql> create table spark_mor_no_pre_t5 (
>          >     id int,
>          > name string,
>          > updated_at timestamp
>          > ) using hudi
>          > options (
>          >     type = 'mor',
>          >     primaryKey = 'id',
>          >     precombineKey = 'updated_at'
>          > ) location 'file:///tmp/output/spark_mor_no_pre_t4';
> Time taken: 0.363 seconds
> spark-sql> 
>          > merge into spark_mor_no_pre_t5 as target
>          > using (
>          >     select 1 as id, 'c' as name, current_timestamp as updated_at
>          > union select 1 as id,'d' as name, current_timestamp as updated_at
>          > union select 1 as id,'e' as name, current_timestamp as updated_at
>          > ) source
>          > on target.id = source.id
>          > when matched then update set *
>          > when not matched then insert *;
> Time taken: 3.111 seconds
> spark-sql> select * from spark_mor_no_pre_t5;
> 20230619151501003 20230619151501003_0_0 1 
> 4405350d-edd6-465b-ac43-8a68d26f957e-0_0-245-274_20230619151501003.parquet 1 
> e 2023-06-19 15:15:01.032766
> 20230619151501003 20230619151501003_0_1 1 
> 4405350d-edd6-465b-ac43-8a68d26f957e-0_0-245-274_20230619151501003.parquet 1 
> e 2023-06-19 15:15:01.032766
> 20230619151501003 20230619151501003_0_2 1 
> 4405350d-edd6-465b-ac43-8a68d26f957e-0_0-245-274_20230619151501003.parquet 1 
> e 2023-06-19 15:15:01.032766
> Time taken: 0.288 seconds, Fetched 3 row(s)
>  
> Github Issue - [https://github.com/apache/hudi/issues/8916]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to