nsivabalan commented on issue #3478: URL: https://github.com/apache/hudi/issues/3478#issuecomment-1018642537
@affei : not sure if this matters. But for a partitioned dataset, a pair of partition path and record key is unique for a given hudi table. So, there could be duplicate record keys in the output across diff partitions. Can you confirm that when you said you are seeing duplicates, you meant duplicate records having same value for both partition path and record keys. If you wish to have globally unique record keys, you may have to choose one of the GLOBAL index options for index types. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
