abhishekshenoy edited a comment on issue #3313:
URL: https://github.com/apache/hudi/issues/3313#issuecomment-1054225531
@nsivabalan i see the issue is closed . But in 0.10.1 i still face the
duplicate issue when i provide a timestamp column as part of composite key.
```
hoodiConfigs.put("hoodie.insert.shuffle.parallelism", "1")
hoodiConfigs.put("hoodie.upsert.shuffle.parallelism", "1")
hoodiConfigs.put("hoodie.bulkinsert.shuffle.parallelism", "1")
hoodiConfigs.put("hoodie.delete.shuffle.parallelism", "1")
hoodiConfigs.put("hoodie.datasource.write.row.writer.enable", "true")
hoodiConfigs.put("hoodie.table.keygenerator.class",
classOf[ComplexKeyGenerator].getName)
hoodiConfigs.put("hoodie.datasource.write.keygenerator.class",
classOf[ComplexKeyGenerator].getName)
hoodiConfigs.put("hoodie.datasource.write.recordkey.field",
"transactionId,storeNbr,transactionTs")
hoodiConfigs.put("hoodie.datasource.write.precombine.field",
"messageMetadata.srcLoadTs")
hoodiConfigs.put("hoodie.table.precombine.field",
"messageMetadata.srcLoadTs")
hoodiConfigs.put("hoodie.datasource.write.partitionpath.field",
"transactionDt")
hoodiConfigs.put("hoodie.datasource.write.payload.class",classOf[DefaultHoodieRecordPayload].getName)
hoodiConfigs.put("hoodie.datasource.write.hive_style_partitioning",
"true")
hoodiConfigs.put("hoodie.datasource.write.table.type",COW_TABLE_TYPE_OPT_VAL)
hoodiConfigs.put("hoodie.combine.before.upsert","true")
hoodiConfigs.put("hoodie.table.name","huditransaction")
hoodiConfigs.put("hoodie.datasource.write.keygenerator.consistent.logical.timestamp.enabled","true")
```
Tried with both BULK_INSERT_OPERATION_OPT_VAL and UPSERT_OPERATION_OPT_VAL
Output dataset after first Insert , If you see Combine before insert did not
work
Key (abc, 4162 , 2022-02-25T05:08:10.73-05:00)
```
+-------------------+---------------------+---------------------------------------------------------------------+------------------------+-----------------------------------------------------------------------+-------------+--------+-----------------------+--------------------------------------------------+----------+--------+---------+----------------+-------------+
|_hoodie_commit_time|_hoodie_commit_seqno |_hoodie_record_key
|_hoodie_partition_path |_hoodie_file_name
|transactionId|storeNbr|transactionTs |messageMetadata
|prefixes |dummyInt|dummyLong|dummyObjects |transactionDt|
+-------------------+---------------------+---------------------------------------------------------------------+------------------------+-----------------------------------------------------------------------+-------------+--------+-----------------------+--------------------------------------------------+----------+--------+---------+----------------+-------------+
|20220228183206147
|20220228183206147_0_1|transactionId:abc,storeNbr:4162,transactionTs:2022-02-25
05:08:10.073|transactionDt=2022-02-25|f86d9a60-8465-410d-bca6-c478bf3a48e9-0_0-10-0_20220228183206147.parquet|abc
|4162 |2022-02-25 05:08:10.073|{key, value, 1, 2, 2022-02-25
05:09:10, 1, { -> }}|[abc, def]|1 |1 |[{a, 1}, {a, 1}]|2022-02-25
|
|20220228183206147
|20220228183206147_0_2|transactionId:abc,storeNbr:4162,transactionTs:2022-02-25
05:08:10.073|transactionDt=2022-02-25|f86d9a60-8465-410d-bca6-c478bf3a48e9-0_0-10-0_20220228183206147.parquet|abc
|4162 |2022-02-25 05:08:10.073|{key, value, 1, 2, 2022-02-25
05:09:10, 1, { -> }}|[abc, def]|1 |1 |[{a, 1}, {a, 1}]|2022-02-25
|
|20220228183206147
|20220228183206147_0_3|transactionId:bcd,storeNbr:4162,transactionTs:2022-02-25
05:08:10.073|transactionDt=2022-02-25|f86d9a60-8465-410d-bca6-c478bf3a48e9-0_0-10-0_20220228183206147.parquet|bcd
|4162 |2022-02-25 05:08:10.073|{key, value, 1, 2, 2022-02-25
05:09:10, 1, { -> }}|[abc, def]|1 |1 |[{a, 1}, {a, 1}]|2022-02-25
|
|20220228183206147
|20220228183206147_0_4|transactionId:cde,storeNbr:4163,transactionTs:2022-02-25
05:08:10.073|transactionDt=2022-02-25|f86d9a60-8465-410d-bca6-c478bf3a48e9-0_0-10-0_20220228183206147.parquet|cde
|4163 |2022-02-25 05:08:10.073|{key, value, 1, 2, 2022-02-25
05:09:10, 1, { -> }}|[abc, def]|1 |1 |[{a, 1}, {a, 1}]|2022-02-25
|
|20220228183206147
|20220228183206147_0_5|transactionId:def,storeNbr:4163,transactionTs:2022-02-25
05:08:10.073|transactionDt=2022-02-25|f86d9a60-8465-410d-bca6-c478bf3a48e9-0_0-10-0_20220228183206147.parquet|def
|4163 |2022-02-25 05:08:10.073|{key, value, 1, 2, 2022-02-25
05:09:10, 1, { -> }}|[abc, def]|1 |1 |[{a, 1}, {a, 1}]|2022-02-25
|
+-------------------+---------------------+---------------------------------------------------------------------+------------------------+-----------------------------------------------------------------------+-------------+--------+-----------------------+--------------------------------------------------+----------+--------+---------+----------------+-------------+
```
Republishing with an addition record for the same key (abc, 4162 ,
2022-02-25T05:08:10.73-05:00) does not get deduped
```
+-------------------+---------------------+---------------------------------------------------------------------+------------------------+-----------------------------------------------------------------------+-------------+--------+-----------------------+--------------------------------------------------+----------+--------+---------+----------------+-------------+
|_hoodie_commit_time|_hoodie_commit_seqno |_hoodie_record_key
|_hoodie_partition_path |_hoodie_file_name
|transactionId|storeNbr|transactionTs |messageMetadata
|prefixes |dummyInt|dummyLong|dummyObjects |transactionDt|
+-------------------+---------------------+---------------------------------------------------------------------+------------------------+-----------------------------------------------------------------------+-------------+--------+-----------------------+--------------------------------------------------+----------+--------+---------+----------------+-------------+
|20220228183206147
|20220228183206147_0_1|transactionId:abc,storeNbr:4162,transactionTs:2022-02-25
05:08:10.073|transactionDt=2022-02-25|f86d9a60-8465-410d-bca6-c478bf3a48e9-0_0-10-0_20220228183206147.parquet|abc
|4162 |2022-02-25 05:08:10.073|{key, value, 1, 2, 2022-02-25
05:09:10, 1, { -> }}|[abc, def]|1 |1 |[{a, 1}, {a, 1}]|2022-02-25
|
|20220228183206147
|20220228183206147_0_2|transactionId:abc,storeNbr:4162,transactionTs:2022-02-25
05:08:10.073|transactionDt=2022-02-25|f86d9a60-8465-410d-bca6-c478bf3a48e9-0_0-10-0_20220228183206147.parquet|abc
|4162 |2022-02-25 05:08:10.073|{key, value, 1, 2, 2022-02-25
05:09:10, 1, { -> }}|[abc, def]|1 |1 |[{a, 1}, {a, 1}]|2022-02-25
|
|20220228183206147
|20220228183206147_0_3|transactionId:bcd,storeNbr:4162,transactionTs:2022-02-25
05:08:10.073|transactionDt=2022-02-25|f86d9a60-8465-410d-bca6-c478bf3a48e9-0_0-10-0_20220228183206147.parquet|bcd
|4162 |2022-02-25 05:08:10.073|{key, value, 1, 2, 2022-02-25
05:09:10, 1, { -> }}|[abc, def]|1 |1 |[{a, 1}, {a, 1}]|2022-02-25
|
|20220228183206147
|20220228183206147_0_4|transactionId:cde,storeNbr:4163,transactionTs:2022-02-25
05:08:10.073|transactionDt=2022-02-25|f86d9a60-8465-410d-bca6-c478bf3a48e9-0_0-10-0_20220228183206147.parquet|cde
|4163 |2022-02-25 05:08:10.073|{key, value, 1, 2, 2022-02-25
05:09:10, 1, { -> }}|[abc, def]|1 |1 |[{a, 1}, {a, 1}]|2022-02-25
|
|20220228183206147
|20220228183206147_0_5|transactionId:def,storeNbr:4163,transactionTs:2022-02-25
05:08:10.073|transactionDt=2022-02-25|f86d9a60-8465-410d-bca6-c478bf3a48e9-0_0-10-0_20220228183206147.parquet|def
|4163 |2022-02-25 05:08:10.073|{key, value, 1, 2, 2022-02-25
05:09:10, 1, { -> }}|[abc, def]|1 |1 |[{a, 1}, {a, 1}]|2022-02-25
|
|20220228183821299
|20220228183821299_0_1|transactionId:abc,storeNbr:4162,transactionTs:2022-02-25
05:08:10.073|transactionDt=2022-02-25|66ee158e-93f3-4ccc-8b2a-1712c3cdf5cf-0_0-2-0_20220228183821299.parquet
|abc |4162 |2022-02-25 05:08:10.073|{key, value, 1, 2, 2022-02-25
05:09:10, 1, { -> }}|[abc, def]|1 |1 |[{a, 1}, {a, 1}]|2022-02-25
|
+-------------------+---------------------+---------------------------------------------------------------------+------------------------+-----------------------------------------------------------------------+-------------+--------+-----------------------+--------------------------------------------------+----------+--------+---------+----------------+-------------+
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]