ad1happy2go commented on issue #7829: URL: https://github.com/apache/hudi/issues/7829#issuecomment-1486853751
@jtmzheng This issue was partially resolved in spark with this JIRA - https://issues.apache.org/jira/browse/SPARK-23599 But if you check last comment on above JIRA, someone started to see similar duplicate issue what you have with UUID function. "We have encountered this problem with Spark 3.1.2, resulting in duplicate values in a situation where a spark executor died. As suggested in the description, this error was hard to track down and difficult to replicate." How frequent it is ? As a workaround, Can you use combination of both monotonically_increasing_id and uuid to ensure it to be always unique. May be it give a small performance hit due to generation of such a large id but it should be always unique. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
