nnguyen168 commented on PR #16595:
URL: https://github.com/apache/iceberg/pull/16595#issuecomment-4575708663

   After further investigation and testing, I discovered that the fix for 
`listDirRecursivelyWithHadoop` is **not needed**:
   
   ## Findings
   
   1. **`listDirRecursivelyWithFileIO`** - Already has the trailing slash 
normalization (lines 67-70):
      ```java
      String listPath = dir;
      if (!dir.endsWith("/")) {
        listPath = dir + "/";
      }
      Iterable<FileInfo> files = io.listPrefix(listPath);  // String-based 
prefix operation
      ```
      This fix is effective because `listPrefix()` takes a String and does 
prefix-based listing.
   
   2. **`listDirRecursivelyWithHadoop`** - Does NOT need the fix because:
      - **Hadoop's `Path` class normalizes paths and strips trailing slashes** 
- even if we add `/` to the string, `new Path(normalizedDir)` removes it
      - **`fs.listStatus()` is a directory operation**, not a prefix operation 
- it's inherently scoped to the directory and won't accidentally include files 
from `s3://bucket/table-backup` when listing `s3://bucket/table`
   
   ## Test Result
   My test `testListDirRecursivelyWithHadoopNormalizesPath` confirmed this:
   ```
   Expecting actual: "tracking://bucket/table"
   to end with: "/"
   ```
   The Path object stripped the trailing slash, proving the fix is ineffective 
for Hadoop paths.
   
   ## Conclusion
   The issue #16493 is already addressed by the existing trailing slash 
normalization in `listDirRecursivelyWithFileIO`. No changes are needed for 
`listDirRecursivelyWithHadoop`.
   
   I'm closing this PR. @rdblue @pvary please let me know if you disagree with 
this analysis.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to