voonhous commented on issue #17714:
URL: https://github.com/apache/hudi/issues/17714#issuecomment-4701910310

   ## Investigation + CI verification
   
   Root-caused and reproduced on macOS; **verified not reproducible on 
CI/Linux**.
   
   ### Root cause
   
   `CleanActionExecutor.deleteFileAndGetResult` calls 
`storage.getPathInfo(deletePath)` (to pick file-vs-directory delete), which 
routes through `HoodieWrapperFileSystem.getFileStatus`:
   
   ```java
   public FileStatus getFileStatus(Path f) throws IOException {
     return executeFuncWithTimeMetrics(MetricName.getFileStatus.name(), f, () 
-> {
       try {
         consistencyGuard.waitTillFileAppears(convertToDefaultStoragePath(f)); 
// full ~25.2s backoff
       } catch (TimeoutException e) {
         // pass
       }
       return fileSystem.getFileStatus(convertToDefaultPath(f));
     });
   }
   ```
   
   On macOS local `file://`, `waitTillFileAppears` checks 
`convertToDefaultStoragePath(f)`, never sees the file, and runs the full 
exponential backoff (`0.4 + 0.8 + 1.6 + 3.2 + 6.4 + 12.8 = ~25.2s`) per deleted 
file before swallowing the `TimeoutException` (`// pass`). The file is still 
deleted correctly - it is purely wasted wall-clock.
   
   ### Local (macOS) reproduction
   
   - Full `TestCleanerInsertAndCleanByVersions` (4 methods): ~12 min, ~183 
tasks each stalling ~25.2s.
   - A single method (`testInsertAndCleanByVersions`): ~1 min, ~6 stalls.
   - JDK 11 and JDK 17 are identical (the backoff is `Thread.sleep`, 
JDK-independent); the per-method count just scales with how many files the 
method cleans.
   
   ### CI/Linux: not reproducible (verified)
   
   A run with verbose logging (`-Pwarn-log` dropped, 
`-Dsurefire.useFile=false`, so per-task timings are visible):
   
   - 7,109 `Finished task ... in N ms` lines emitted (logging confirmed 
working) and 1,454 cleaner/delete lines (the delete path was genuinely 
exercised).
   - Tasks at ~25s: **0**. Tasks at 10s+: **0**. **Slowest task: 2.87s.** Class 
finished in ~84-120s.
   
   So on CI's Linux filesystem the path resolves immediately and the backoff 
never fires.
   
   ### Conclusion
   
   This is specific to **macOS local `file://` path translation**, not a 
general code defect, and is invisible on CI. The gotcha is now documented 
inline next to `withConsistencyCheckEnabled(true)` in the test so the next 
person hitting a slow local run knows why.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to