Aditya Goenka created HUDI-6410:
-----------------------------------
Summary: MERGE INTO giving duplicate rows even if table have
precombineKey
Key: HUDI-6410
URL: https://issues.apache.org/jira/browse/HUDI-6410
Project: Apache Hudi
Issue Type: Bug
Reporter: Aditya Goenka
Attachments: image-2023-06-19-15-16-58-055.png
Merge into is giving duplicate rows even if precombine key is there.
Example -
spark-sql> create table spark_mor_no_pre_t5 (
> id int,
> name string,
> updated_at timestamp
> ) using hudi
> options (
> type = 'mor',
> primaryKey = 'id',
> precombineKey = 'updated_at'
> ) location 'file:///tmp/output/spark_mor_no_pre_t4';
Time taken: 0.363 seconds
spark-sql>
> merge into spark_mor_no_pre_t5 as target
> using (
> select 1 as id, 'c' as name, current_timestamp as updated_at
> union select 1 as id,'d' as name, current_timestamp as updated_at
> union select 1 as id,'e' as name, current_timestamp as updated_at
> ) source
> on target.id = source.id
> when matched then update set *
> when not matched then insert *;
Time taken: 3.111 seconds
spark-sql> select * from spark_mor_no_pre_t5;
20230619151501003 20230619151501003_0_0 1
4405350d-edd6-465b-ac43-8a68d26f957e-0_0-245-274_20230619151501003.parquet 1 e
2023-06-19 15:15:01.032766
20230619151501003 20230619151501003_0_1 1
4405350d-edd6-465b-ac43-8a68d26f957e-0_0-245-274_20230619151501003.parquet 1 e
2023-06-19 15:15:01.032766
20230619151501003 20230619151501003_0_2 1
4405350d-edd6-465b-ac43-8a68d26f957e-0_0-245-274_20230619151501003.parquet 1 e
2023-06-19 15:15:01.032766
Time taken: 0.288 seconds, Fetched 3 row(s)
Github Issue - [https://github.com/apache/hudi/issues/8916]
--
This message was sent by Atlassian Jira
(v8.20.10#820010)