[GitHub] [hudi] ad1happy2go commented on issue #7829: [SUPPORT] Using monotonically_increasing_id to generate record key causing duplicates on upsert

via GitHub Tue, 28 Mar 2023 06:07:01 -0700


ad1happy2go commented on issue #7829:
URL: https://github.com/apache/hudi/issues/7829#issuecomment-1486853751


   @jtmzheng 
   
   This issue was partially resolved in spark with this JIRA - 
https://issues.apache.org/jira/browse/SPARK-23599
   
   But if you check last comment on above JIRA, someone started to see similar 
duplicate issue what you have with UUID function.
   
   "We have encountered this problem with Spark 3.1.2, resulting in duplicate 
values in a situation where a spark executor died. As suggested in the 
description, this error was hard to track down and difficult to replicate."
   
   How frequent it is ? As a workaround,  Can you use combination of both 
monotonically_increasing_id and uuid to ensure it to be always unique. May be 
it give a small performance hit due to generation of such a large id but it 
should be always unique.
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] ad1happy2go commented on issue #7829: [SUPPORT] Using monotonically_increasing_id to generate record key causing duplicates on upsert

Reply via email to