ergunbaris commented on issue #11718:
URL: https://github.com/apache/hudi/issues/11718#issuecomment-2265252110

   @KnightChess Thanks for the question I have looked into production code and 
hoodie directory and timeline is like below
   - Replay-fix (INSERT_OVERWRITE) ran between 2024-04-24 15:48:59.886044+00:00 
AND 2024-04-27 01:45:39.598175+00:00
   - Hudi cleaner ran 20240501084821521 (this is the hudi-cli cleans show 
output I guess it is UTC?) for 44 minutes and deleted all the unreferenced 
parquet files except it didn't for the problematic date partitions
   - All the replacecommits archived May 1, 2024, 11:33:47 (UTC+01:00) which is 
 May 1, 2024, 10:33:47 (UTC+00:00) 
   
   So basically all the replace commits were archived after cleaner ran. 
   
   On the other hand first archived replacecommits would always be the oldest 
date partition dates. But in our case oldest date partitions were successfully 
processed and all the date partitions related to the batches towards the end of 
the process were duplicated!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to