abhishekshenoy edited a comment on issue #3313:
URL: https://github.com/apache/hudi/issues/3313#issuecomment-1054225531


   @nsivabalan i see the issue is closed . But in 0.10.1 i still face the 
duplicate issue when i provide a timestamp column as part of composite key.
   ```
       hoodiConfigs.put("hoodie.insert.shuffle.parallelism", "1")
       hoodiConfigs.put("hoodie.upsert.shuffle.parallelism", "1")
       hoodiConfigs.put("hoodie.bulkinsert.shuffle.parallelism", "1")
       hoodiConfigs.put("hoodie.delete.shuffle.parallelism", "1")
       hoodiConfigs.put("hoodie.datasource.write.row.writer.enable", "true")
       hoodiConfigs.put("hoodie.table.keygenerator.class", 
classOf[ComplexKeyGenerator].getName)
       hoodiConfigs.put("hoodie.datasource.write.keygenerator.class", 
classOf[ComplexKeyGenerator].getName)
       hoodiConfigs.put("hoodie.datasource.write.recordkey.field", 
"transactionId,storeNbr,transactionTs")
       hoodiConfigs.put("hoodie.datasource.write.precombine.field", 
"messageMetadata.srcLoadTs")
       hoodiConfigs.put("hoodie.table.precombine.field", 
"messageMetadata.srcLoadTs")
       hoodiConfigs.put("hoodie.datasource.write.partitionpath.field", 
"transactionDt")
       
hoodiConfigs.put("hoodie.datasource.write.payload.class",classOf[DefaultHoodieRecordPayload].getName)
       hoodiConfigs.put("hoodie.datasource.write.hive_style_partitioning", 
"true")
       
hoodiConfigs.put("hoodie.datasource.write.table.type",COW_TABLE_TYPE_OPT_VAL)
       hoodiConfigs.put("hoodie.combine.before.upsert","true")
       hoodiConfigs.put("hoodie.table.name","huditransaction")
       
hoodiConfigs.put("hoodie.datasource.write.keygenerator.consistent.logical.timestamp.enabled","true")
   ```    
   
   With BULK_INSERT_OPERATION_OPT_VAL 
       
   Output dataset after first Insert does not dedupe records within the batch 
even on setting combine before insert to true
   Key (abc, 4162 , 2022-02-25 05:08:10.73)
   ```
   
+-------------------+---------------------+---------------------------------------------------------------------+------------------------+-----------------------------------------------------------------------+-------------+--------+-----------------------+--------------------------------------------------+----------+--------+---------+----------------+-------------+
   |_hoodie_commit_time|_hoodie_commit_seqno |_hoodie_record_key                
                                   |_hoodie_partition_path  |_hoodie_file_name  
                                                    
|transactionId|storeNbr|transactionTs          |messageMetadata                 
                  |prefixes  |dummyInt|dummyLong|dummyObjects    |transactionDt|
   
+-------------------+---------------------+---------------------------------------------------------------------+------------------------+-----------------------------------------------------------------------+-------------+--------+-----------------------+--------------------------------------------------+----------+--------+---------+----------------+-------------+
   |20220228210614823  
|20220228210614823_0_1|transactionId:abc,storeNbr:4162,transactionTs:2022-02-25 
05:08:10.073|transactionDt=2022-02-25|d572dc96-ed78-46ae-8560-430d82456941-0_0-10-0_20220228210614823.parquet|abc
          |4162    |2022-02-25 05:08:10.073|{key, value, 1, 2, 2022-02-25 
05:09:10, 1, { -> }}|[abc, def]|1       |1        |[{a, 1}, {a, 1}]|2022-02-25  
 |
   |20220228210614823  
|20220228210614823_0_2|transactionId:abc,storeNbr:4162,transactionTs:2022-02-25 
05:08:10.073|transactionDt=2022-02-25|d572dc96-ed78-46ae-8560-430d82456941-0_0-10-0_20220228210614823.parquet|abc
          |4162    |2022-02-25 05:08:10.073|{key, value, 1, 2, 2022-02-26 
05:09:10, 1, { -> }}|[abc, def]|1       |1        |[{a, 1}, {a, 1}]|2022-02-25  
 |
   |20220228210614823  
|20220228210614823_0_3|transactionId:bcd,storeNbr:4162,transactionTs:2022-02-25 
05:08:10.073|transactionDt=2022-02-25|d572dc96-ed78-46ae-8560-430d82456941-0_0-10-0_20220228210614823.parquet|bcd
          |4162    |2022-02-25 05:08:10.073|{key, value, 1, 2, 2022-02-25 
05:09:10, 1, { -> }}|[abc, def]|1       |1        |[{a, 1}, {a, 1}]|2022-02-25  
 |
   |20220228210614823  
|20220228210614823_0_4|transactionId:cde,storeNbr:4163,transactionTs:2022-02-25 
05:08:10.073|transactionDt=2022-02-25|d572dc96-ed78-46ae-8560-430d82456941-0_0-10-0_20220228210614823.parquet|cde
          |4163    |2022-02-25 05:08:10.073|{key, value, 1, 2, 2022-02-25 
05:09:10, 1, { -> }}|[abc, def]|1       |1        |[{a, 1}, {a, 1}]|2022-02-25  
 |
   |20220228210614823  
|20220228210614823_0_5|transactionId:def,storeNbr:4163,transactionTs:2022-02-25 
05:08:10.073|transactionDt=2022-02-25|d572dc96-ed78-46ae-8560-430d82456941-0_0-10-0_20220228210614823.parquet|def
          |4163    |2022-02-25 05:08:10.073|{key, value, 1, 2, 2022-02-25 
05:09:10, 1, { -> }}|[abc, def]|1       |1        |[{a, 1}, {a, 1}]|2022-02-25  
 |
   
+-------------------+---------------------+---------------------------------------------------------------------+------------------------+-----------------------------------------------------------------------+-------------+--------+-----------------------+--------------------------------------------------+----------+--------+---------+----------------+-------------+
   
   ```
   
   Republishing the same data with UPSERT_OPERATION_OPT_VAL will result in 
duplicates as well as the data in transactionTs and messageMetadata.srcLoadTs 
for the records loaded using  BULK_INSERT_OPERATION_OPT_VAL has changed.
   
   1. If you see the recordKey field , the transactionTs value is 
epochTimeStamp for records loaded using UPSERT_OPERATION_OPT_VAL and 
UnixTimeStamp for records loaded using BULK_INSERT_OPERATION_OPT_VAL.
   2. With UPSERT_OPERATION_OPT_VAL we see Combine before insert work correctly.
   3. The columns transactionTs and messageMetadata.srcLoadTs has its value 
changed to 1970-01-20 06:39:05.890073
   
   ```
   
+-------------------+---------------------+---------------------------------------------------------------------+------------------------+------------------------------------------------------------------------+-------------+--------+--------------------------+-----------------------------------------------------+----------+--------+---------+----------------+-------------+
   |_hoodie_commit_time|_hoodie_commit_seqno |_hoodie_record_key                
                                   |_hoodie_partition_path  |_hoodie_file_name  
                                                     
|transactionId|storeNbr|transactionTs             |messageMetadata              
                        |prefixes  |dummyInt|dummyLong|dummyObjects    
|transactionDt|
   
+-------------------+---------------------+---------------------------------------------------------------------+------------------------+------------------------------------------------------------------------+-------------+--------+--------------------------+-----------------------------------------------------+----------+--------+---------+----------------+-------------+
   |20220228210614823  
|20220228210614823_0_1|transactionId:abc,storeNbr:4162,transactionTs:2022-02-25 
05:08:10.073|transactionDt=2022-02-25|d572dc96-ed78-46ae-8560-430d82456941-0_0-10-0_20220228210614823.parquet
 |abc          |4162    |1970-01-20 06:39:05.890073|{key, value, 1, 2, 
1970-01-20 06:39:05.95, 1, { -> }}|[abc, def]|1       |1        |[{a, 1}, {a, 
1}]|2022-02-25   |
   |20220228210614823  
|20220228210614823_0_2|transactionId:abc,storeNbr:4162,transactionTs:2022-02-25 
05:08:10.073|transactionDt=2022-02-25|d572dc96-ed78-46ae-8560-430d82456941-0_0-10-0_20220228210614823.parquet
 |abc          |4162    |1970-01-20 06:39:05.890073|{key, value, 1, 2, 
1970-01-20 06:40:32.35, 1, { -> }}|[abc, def]|1       |1        |[{a, 1}, {a, 
1}]|2022-02-25   |
   |20220228210614823  
|20220228210614823_0_3|transactionId:bcd,storeNbr:4162,transactionTs:2022-02-25 
05:08:10.073|transactionDt=2022-02-25|d572dc96-ed78-46ae-8560-430d82456941-0_0-10-0_20220228210614823.parquet
 |bcd          |4162    |1970-01-20 06:39:05.890073|{key, value, 1, 2, 
1970-01-20 06:39:05.95, 1, { -> }}|[abc, def]|1       |1        |[{a, 1}, {a, 
1}]|2022-02-25   |
   |20220228210614823  
|20220228210614823_0_4|transactionId:cde,storeNbr:4163,transactionTs:2022-02-25 
05:08:10.073|transactionDt=2022-02-25|d572dc96-ed78-46ae-8560-430d82456941-0_0-10-0_20220228210614823.parquet
 |cde          |4163    |1970-01-20 06:39:05.890073|{key, value, 1, 2, 
1970-01-20 06:39:05.95, 1, { -> }}|[abc, def]|1       |1        |[{a, 1}, {a, 
1}]|2022-02-25   |
   |20220228210614823  
|20220228210614823_0_5|transactionId:def,storeNbr:4163,transactionTs:2022-02-25 
05:08:10.073|transactionDt=2022-02-25|d572dc96-ed78-46ae-8560-430d82456941-0_0-10-0_20220228210614823.parquet
 |def          |4163    |1970-01-20 06:39:05.890073|{key, value, 1, 2, 
1970-01-20 06:39:05.95, 1, { -> }}|[abc, def]|1       |1        |[{a, 1}, {a, 
1}]|2022-02-25   |
   |20220228210729355  
|20220228210729355_0_1|transactionId:bcd,storeNbr:4162,transactionTs:1645745890073000
       
|transactionDt=2022-02-25|d572dc96-ed78-46ae-8560-430d82456941-0_0-26-20_20220228210729355.parquet|bcd
          |4162    |2022-02-25 05:08:10.073   |{key, value, 1, 2, 2022-02-25 
05:09:10, 1, { -> }}   |[abc, def]|1       |1        |[{a, 1}, {a, 
1}]|2022-02-25   |
   |20220228210729355  
|20220228210729355_0_2|transactionId:cde,storeNbr:4163,transactionTs:1645745890073000
       
|transactionDt=2022-02-25|d572dc96-ed78-46ae-8560-430d82456941-0_0-26-20_20220228210729355.parquet|cde
          |4163    |2022-02-25 05:08:10.073   |{key, value, 1, 2, 2022-02-25 
05:09:10, 1, { -> }}   |[abc, def]|1       |1        |[{a, 1}, {a, 
1}]|2022-02-25   |
   |20220228210729355  
|20220228210729355_0_3|transactionId:def,storeNbr:4163,transactionTs:1645745890073000
       
|transactionDt=2022-02-25|d572dc96-ed78-46ae-8560-430d82456941-0_0-26-20_20220228210729355.parquet|def
          |4163    |2022-02-25 05:08:10.073   |{key, value, 1, 2, 2022-02-25 
05:09:10, 1, { -> }}   |[abc, def]|1       |1        |[{a, 1}, {a, 
1}]|2022-02-25   |
   |20220228210729355  
|20220228210729355_0_4|transactionId:abc,storeNbr:4162,transactionTs:1645745890073000
       
|transactionDt=2022-02-25|d572dc96-ed78-46ae-8560-430d82456941-0_0-26-20_20220228210729355.parquet|abc
          |4162    |2022-02-25 05:08:10.073   |{key, value, 1, 2, 2022-02-26 
05:09:10, 1, { -> }}   |[abc, def]|1       |1        |[{a, 1}, {a, 
1}]|2022-02-25   |
   
+-------------------+---------------------+---------------------------------------------------------------------+------------------------+------------------------------------------------------------------------+-------------+--------+--------------------------+-----------------------------------------------------+----------+--------+---------+----------------+-------------+
   
   ```
   
       


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to