YuweiXiao commented on issue #5777: URL: https://github.com/apache/hudi/issues/5777#issuecomment-1157542841
Hi @jjtjiang , looking at the data you posted. I am wondering if you enable the de-duplication option during writing. Because there are records with the same key in a single commit (writing to log files). And those records will be merged in the compaction process, which could justify the result you see, i.e., no duplication after a while (after the compaction). For de-deup options, check out https://hudi.incubator.apache.org/docs/configurations#hoodiedatasourcewriteinsertdropduplicates -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
