abhishekshenoy edited a comment on issue #3313:
URL: https://github.com/apache/hudi/issues/3313#issuecomment-1054225531
@nsivabalan i see the issue is closed . But in 0.10.1 i still face the
duplicate issue when i provide a timestamp column as part of composite key.
```
hoodiConfigs.put("hoodie.insert.shuffle.parallelism", "1")
hoodiConfigs.put("hoodie.upsert.shuffle.parallelism", "1")
hoodiConfigs.put("hoodie.bulkinsert.shuffle.parallelism", "1")
hoodiConfigs.put("hoodie.delete.shuffle.parallelism", "1")
hoodiConfigs.put("hoodie.datasource.write.row.writer.enable", "true")
hoodiConfigs.put("hoodie.table.keygenerator.class",
classOf[ComplexKeyGenerator].getName)
hoodiConfigs.put("hoodie.datasource.write.keygenerator.class",
classOf[ComplexKeyGenerator].getName)
hoodiConfigs.put("hoodie.datasource.write.recordkey.field",
"transactionId,storeNbr,transactionTs")
hoodiConfigs.put("hoodie.datasource.write.precombine.field",
"messageMetadata.srcLoadTs")
hoodiConfigs.put("hoodie.table.precombine.field",
"messageMetadata.srcLoadTs")
hoodiConfigs.put("hoodie.datasource.write.partitionpath.field",
"transactionDt")
hoodiConfigs.put("hoodie.datasource.write.payload.class",classOf[DefaultHoodieRecordPayload].getName)
hoodiConfigs.put("hoodie.datasource.write.hive_style_partitioning",
"true")
hoodiConfigs.put("hoodie.datasource.write.table.type",COW_TABLE_TYPE_OPT_VAL)
hoodiConfigs.put("hoodie.combine.before.upsert","true")
hoodiConfigs.put("hoodie.table.name","huditransaction")
hoodiConfigs.put("hoodie.datasource.write.keygenerator.consistent.logical.timestamp.enabled","true")
```
With both BULK_INSERT_OPERATION_OPT_VAL
Output dataset after first Insert does not dedupe records within the batch
even on setting combine before insert to true
Key (abc, 4162 , 2022-02-25 05:08:10.73)
```
+-------------------+---------------------+---------------------------------------------------------------------+------------------------+-----------------------------------------------------------------------+-------------+--------+-----------------------+--------------------------------------------------+----------+--------+---------+----------------+-------------+
|_hoodie_commit_time|_hoodie_commit_seqno |_hoodie_record_key
|_hoodie_partition_path |_hoodie_file_name
|transactionId|storeNbr|transactionTs |messageMetadata
|prefixes |dummyInt|dummyLong|dummyObjects |transactionDt|
+-------------------+---------------------+---------------------------------------------------------------------+------------------------+-----------------------------------------------------------------------+-------------+--------+-----------------------+--------------------------------------------------+----------+--------+---------+----------------+-------------+
|20220228210614823
|20220228210614823_0_1|transactionId:abc,storeNbr:4162,transactionTs:2022-02-25
05:08:10.073|transactionDt=2022-02-25|d572dc96-ed78-46ae-8560-430d82456941-0_0-10-0_20220228210614823.parquet|abc
|4162 |2022-02-25 05:08:10.073|{key, value, 1, 2, 2022-02-25
05:09:10, 1, { -> }}|[abc, def]|1 |1 |[{a, 1}, {a, 1}]|2022-02-25
|
|20220228210614823
|20220228210614823_0_2|transactionId:abc,storeNbr:4162,transactionTs:2022-02-25
05:08:10.073|transactionDt=2022-02-25|d572dc96-ed78-46ae-8560-430d82456941-0_0-10-0_20220228210614823.parquet|abc
|4162 |2022-02-25 05:08:10.073|{key, value, 1, 2, 2022-02-26
05:09:10, 1, { -> }}|[abc, def]|1 |1 |[{a, 1}, {a, 1}]|2022-02-25
|
|20220228210614823
|20220228210614823_0_3|transactionId:bcd,storeNbr:4162,transactionTs:2022-02-25
05:08:10.073|transactionDt=2022-02-25|d572dc96-ed78-46ae-8560-430d82456941-0_0-10-0_20220228210614823.parquet|bcd
|4162 |2022-02-25 05:08:10.073|{key, value, 1, 2, 2022-02-25
05:09:10, 1, { -> }}|[abc, def]|1 |1 |[{a, 1}, {a, 1}]|2022-02-25
|
|20220228210614823
|20220228210614823_0_4|transactionId:cde,storeNbr:4163,transactionTs:2022-02-25
05:08:10.073|transactionDt=2022-02-25|d572dc96-ed78-46ae-8560-430d82456941-0_0-10-0_20220228210614823.parquet|cde
|4163 |2022-02-25 05:08:10.073|{key, value, 1, 2, 2022-02-25
05:09:10, 1, { -> }}|[abc, def]|1 |1 |[{a, 1}, {a, 1}]|2022-02-25
|
|20220228210614823
|20220228210614823_0_5|transactionId:def,storeNbr:4163,transactionTs:2022-02-25
05:08:10.073|transactionDt=2022-02-25|d572dc96-ed78-46ae-8560-430d82456941-0_0-10-0_20220228210614823.parquet|def
|4163 |2022-02-25 05:08:10.073|{key, value, 1, 2, 2022-02-25
05:09:10, 1, { -> }}|[abc, def]|1 |1 |[{a, 1}, {a, 1}]|2022-02-25
|
+-------------------+---------------------+---------------------------------------------------------------------+------------------------+-----------------------------------------------------------------------+-------------+--------+-----------------------+--------------------------------------------------+----------+--------+---------+----------------+-------------+
```
Republishing the same data with UPSERT_OPERATION_OPT_VAL will result in
duplicates as well as the data in transactionTs and messageMetadata.srcLoadTs
for the records loaded using BULK_INSERT_OPERATION_OPT_VAL has changed.
1. If you see the recordKey field , the transactionTs value is
epochTimeStamp for records loaded using UPSERT_OPERATION_OPT_VAL and
UnixTimeStamp for records loaded using BULK_INSERT_OPERATION_OPT_VAL.
2. With UPSERT_OPERATION_OPT_VAL we see Combine before insert work correctly.
3. The columns transactionTs and messageMetadata.srcLoadTs has its value
changed to 1970-01-20 06:39:05.890073
```
+-------------------+---------------------+---------------------------------------------------------------------+------------------------+------------------------------------------------------------------------+-------------+--------+--------------------------+-----------------------------------------------------+----------+--------+---------+----------------+-------------+
|_hoodie_commit_time|_hoodie_commit_seqno |_hoodie_record_key
|_hoodie_partition_path |_hoodie_file_name
|transactionId|storeNbr|transactionTs |messageMetadata
|prefixes |dummyInt|dummyLong|dummyObjects
|transactionDt|
+-------------------+---------------------+---------------------------------------------------------------------+------------------------+------------------------------------------------------------------------+-------------+--------+--------------------------+-----------------------------------------------------+----------+--------+---------+----------------+-------------+
|20220228210614823
|20220228210614823_0_1|transactionId:abc,storeNbr:4162,transactionTs:2022-02-25
05:08:10.073|transactionDt=2022-02-25|d572dc96-ed78-46ae-8560-430d82456941-0_0-10-0_20220228210614823.parquet
|abc |4162 |1970-01-20 06:39:05.890073|{key, value, 1, 2,
1970-01-20 06:39:05.95, 1, { -> }}|[abc, def]|1 |1 |[{a, 1}, {a,
1}]|2022-02-25 |
|20220228210614823
|20220228210614823_0_2|transactionId:abc,storeNbr:4162,transactionTs:2022-02-25
05:08:10.073|transactionDt=2022-02-25|d572dc96-ed78-46ae-8560-430d82456941-0_0-10-0_20220228210614823.parquet
|abc |4162 |1970-01-20 06:39:05.890073|{key, value, 1, 2,
1970-01-20 06:40:32.35, 1, { -> }}|[abc, def]|1 |1 |[{a, 1}, {a,
1}]|2022-02-25 |
|20220228210614823
|20220228210614823_0_3|transactionId:bcd,storeNbr:4162,transactionTs:2022-02-25
05:08:10.073|transactionDt=2022-02-25|d572dc96-ed78-46ae-8560-430d82456941-0_0-10-0_20220228210614823.parquet
|bcd |4162 |1970-01-20 06:39:05.890073|{key, value, 1, 2,
1970-01-20 06:39:05.95, 1, { -> }}|[abc, def]|1 |1 |[{a, 1}, {a,
1}]|2022-02-25 |
|20220228210614823
|20220228210614823_0_4|transactionId:cde,storeNbr:4163,transactionTs:2022-02-25
05:08:10.073|transactionDt=2022-02-25|d572dc96-ed78-46ae-8560-430d82456941-0_0-10-0_20220228210614823.parquet
|cde |4163 |1970-01-20 06:39:05.890073|{key, value, 1, 2,
1970-01-20 06:39:05.95, 1, { -> }}|[abc, def]|1 |1 |[{a, 1}, {a,
1}]|2022-02-25 |
|20220228210614823
|20220228210614823_0_5|transactionId:def,storeNbr:4163,transactionTs:2022-02-25
05:08:10.073|transactionDt=2022-02-25|d572dc96-ed78-46ae-8560-430d82456941-0_0-10-0_20220228210614823.parquet
|def |4163 |1970-01-20 06:39:05.890073|{key, value, 1, 2,
1970-01-20 06:39:05.95, 1, { -> }}|[abc, def]|1 |1 |[{a, 1}, {a,
1}]|2022-02-25 |
|20220228210729355
|20220228210729355_0_1|transactionId:bcd,storeNbr:4162,transactionTs:1645745890073000
|transactionDt=2022-02-25|d572dc96-ed78-46ae-8560-430d82456941-0_0-26-20_20220228210729355.parquet|bcd
|4162 |2022-02-25 05:08:10.073 |{key, value, 1, 2, 2022-02-25
05:09:10, 1, { -> }} |[abc, def]|1 |1 |[{a, 1}, {a,
1}]|2022-02-25 |
|20220228210729355
|20220228210729355_0_2|transactionId:cde,storeNbr:4163,transactionTs:1645745890073000
|transactionDt=2022-02-25|d572dc96-ed78-46ae-8560-430d82456941-0_0-26-20_20220228210729355.parquet|cde
|4163 |2022-02-25 05:08:10.073 |{key, value, 1, 2, 2022-02-25
05:09:10, 1, { -> }} |[abc, def]|1 |1 |[{a, 1}, {a,
1}]|2022-02-25 |
|20220228210729355
|20220228210729355_0_3|transactionId:def,storeNbr:4163,transactionTs:1645745890073000
|transactionDt=2022-02-25|d572dc96-ed78-46ae-8560-430d82456941-0_0-26-20_20220228210729355.parquet|def
|4163 |2022-02-25 05:08:10.073 |{key, value, 1, 2, 2022-02-25
05:09:10, 1, { -> }} |[abc, def]|1 |1 |[{a, 1}, {a,
1}]|2022-02-25 |
|20220228210729355
|20220228210729355_0_4|transactionId:abc,storeNbr:4162,transactionTs:1645745890073000
|transactionDt=2022-02-25|d572dc96-ed78-46ae-8560-430d82456941-0_0-26-20_20220228210729355.parquet|abc
|4162 |2022-02-25 05:08:10.073 |{key, value, 1, 2, 2022-02-26
05:09:10, 1, { -> }} |[abc, def]|1 |1 |[{a, 1}, {a,
1}]|2022-02-25 |
+-------------------+---------------------+---------------------------------------------------------------------+------------------------+------------------------------------------------------------------------+-------------+--------+--------------------------+-----------------------------------------------------+----------+--------+---------+----------------+-------------+
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]