hudi-bot opened a new issue, #16007:
URL: https://github.com/apache/hudi/issues/16007

   We recently experienced a large data loss in one of our largest Hudi tables. 
We observed that entire partitions in our table were being deleted but we were 
initially unsure why. After a deep analysis of the code, we traced it to the 
Cleaning service, specifically the logic which decides whether a given 
partition is empty. We are running Hudi 0.12.3 so this is the link to the code 
I'm referencing:
   
   
[https://github.com/apache/hudi/blob/release-0.12.3/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/clean/CleanPlanner.java#L370]
   
    
   
   The root cause of our issue is that we are using the Metadata Table (MDT) 
and it became inconsistent with the underlying filesystem somehow (we are 
unsure of the root cause). We did not have any auditing for the MDT to alert us 
to inconsistencies so the MDT remained in this state for a considerable amount 
of time.
   
   Because of the inconsistencies, there were many partitions that existed on 
disk but did not exist in the MDT. A full, non-incremental clean was run on the 
table which caused the Cleaner to scan all partitions in the table and compare 
what was on disk with what was in the MDT. The cleaner mistakenly considered 
all of the partitions that were on disk to be empty (even though they were not) 
and proceeded to perform a recursive delete of all those partitions.
   
   Due to the high-risk nature of partition deletes, I propose a configuration 
which allows Hudi operators to disable partition deletes on critical tables 
where deleting entire partitions is never desired. This aligns with all of our 
time-series Hudi tables.
   
    
   
   NOTE: I see that there are some improvements to the logic which determines 
an empty partition in the Master branch (not yet released). These improvements 
are great but due to the high-risk nature of these partition deletes, I still 
propose that an addition configuration be added so that users can fully disable 
partition deletes against tables that should never experience those.
   
   Recent changes: 
https://github.com/apache/hudi/blob/master/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/clean/CleanPlanner.java#L392
   
   ## JIRA info
   
   - Link: https://issues.apache.org/jira/browse/HUDI-6339
   - Type: Improvement


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to