YuweiXiao commented on issue #5777:
URL: https://github.com/apache/hudi/issues/5777#issuecomment-1157542841

   Hi @jjtjiang , looking at the data you posted. I am wondering if you enable 
the de-duplication option during writing. Because there are records with the 
same key in a single commit (writing to log files).
   
   And those records will be merged in the compaction process, which could 
justify the result you see, i.e., no duplication after a while (after the 
compaction).
   
   For de-deup options, check out 
https://hudi.incubator.apache.org/docs/configurations#hoodiedatasourcewriteinsertdropduplicates


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to