voonhous commented on issue #17714:
URL: https://github.com/apache/hudi/issues/17714#issuecomment-3691539207

   # Main root cause:
   1. MDT is stale, causing files that were cleaned in `CLEAN_PLAN_A` to be 
candidates for `CLEAN_PLAN_B`.
   2. `FILEGROUP_A.ver1` is now candidate in both `CLEAN_PLAN_A` and 
`CLEAN_PLAN_B`
   3.  `FILEGROUP_A.ver1` is deleted after `CLEAN_PLAN_A` has completed.
   4. When `CLEAN_PLAN_B` executes, `FILEGROUP_A.ver1` will be deleted again, 
but it does not exist
   
   # Interplay with consistency guard
   1. Assuming we are running the test with **consistency guard = true**, i.e. 
we are running in an eventually consistent file system
   2. Before deletion, `storage.getPathInfo(deletePath).isDirectory()` to check 
if the path is a directory before deleting
   3. This triggers `HoodieHadoopStorage.getPathInfo()` -> 
`FileSystem.getFileStatus()` -> `HoodieWrapperFileSystem.getFileStatus()`
   4. `HoodieWrapperFileSystem.getFileStatus()` calls 
`consistencyGuard.waitTillFileAppears(f)`
   5. File is deleted, so it keeps waiting with exponential backoff of 
**200ms** with 6 retries up to a maximum of **12,800ms,** cumulatively waiting 
a total of **25,200ms**
   6. After the exponential backoff, the assumption is made that the system is 
now consistent, there's high confidence that the file is indeed deleted, 
instead of "in the midst of being propagated"
   7. File has been deleted, end.
   
   # Verification
   I augmented the test to disable MDT and the run times were similar of 
disabling consistency guard of around 40 seconds.
   
   # Final thoughts
   This is definitely not a problem with consistency guard as we are being very 
very conservative, enforcing an exponential backoff during WRITEs and also 
READs to remove any chance of false positives.
   
   e.g. File has been deleted in previous clean, but job doesn't know about 
this. Since, file may not be propagated or it has been deleted, Hudi takes the 
conservative route to wait out the exponential back off, rather than eagerly 
delete the file. This is the correct tradeoff as i feel waiting a 25 seconds is 
much much better than having an orphaned data file, silently corrupting the 
table.
   
   # Fixing this test
   We can speed up the test by:
   1. Disabling MDT
   2. Disabling consistency guard (I am more inclined to choose this as S3 has 
since been strongly consistent, and has been for awhile, which the config was 
introduced to guard against)
   
   # Final thoughts
   1. Is MDT being out of sync acceptable in such scenarios? 
   2. While it doesn't cause any silent data corruptions (SDC) in our test case 
here, have we verified that this will not cause SDC in other edge cases?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to