voonhous commented on issue #17714: URL: https://github.com/apache/hudi/issues/17714#issuecomment-3691539207
# Main root cause: 1. MDT is stale, causing files that were cleaned in `CLEAN_PLAN_A` to be candidates for `CLEAN_PLAN_B`. 2. `FILEGROUP_A.ver1` is now candidate in both `CLEAN_PLAN_A` and `CLEAN_PLAN_B` 3. `FILEGROUP_A.ver1` is deleted after `CLEAN_PLAN_A` has completed. 4. When `CLEAN_PLAN_B` executes, `FILEGROUP_A.ver1` will be deleted again, but it does not exist # Interplay with consistency guard 1. Assuming we are running the test with **consistency guard = true**, i.e. we are running in an eventually consistent file system 2. Before deletion, `storage.getPathInfo(deletePath).isDirectory()` to check if the path is a directory before deleting 3. This triggers `HoodieHadoopStorage.getPathInfo()` -> `FileSystem.getFileStatus()` -> `HoodieWrapperFileSystem.getFileStatus()` 4. `HoodieWrapperFileSystem.getFileStatus()` calls `consistencyGuard.waitTillFileAppears(f)` 5. File is deleted, so it keeps waiting with exponential backoff of **200ms** with 6 retries up to a maximum of **12,800ms,** cumulatively waiting a total of **25,200ms** 6. After the exponential backoff, the assumption is made that the system is now consistent, there's high confidence that the file is indeed deleted, instead of "in the midst of being propagated" 7. File has been deleted, end. # Verification I augmented the test to disable MDT and the run times were similar of disabling consistency guard of around 40 seconds. # Final thoughts This is definitely not a problem with consistency guard as we are being very very conservative, enforcing an exponential backoff during WRITEs and also READs to remove any chance of false positives. e.g. File has been deleted in previous clean, but job doesn't know about this. Since, file may not be propagated or it has been deleted, Hudi takes the conservative route to wait out the exponential back off, rather than eagerly delete the file. This is the correct tradeoff as i feel waiting a 25 seconds is much much better than having an orphaned data file, silently corrupting the table. # Fixing this test We can speed up the test by: 1. Disabling MDT 2. Disabling consistency guard (I am more inclined to choose this as S3 has since been strongly consistent, and has been for awhile, which the config was introduced to guard against) # Final thoughts 1. Is MDT being out of sync acceptable in such scenarios? 2. While it doesn't cause any silent data corruptions (SDC) in our test case here, have we verified that this will not cause SDC in other edge cases? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
