nnguyen168 commented on PR #16595:
URL: https://github.com/apache/iceberg/pull/16595#issuecomment-4575708663
After further investigation and testing, I discovered that the fix for
`listDirRecursivelyWithHadoop` is **not needed**:
## Findings
1. **`listDirRecursivelyWithFileIO`** - Already has the trailing slash
normalization (lines 67-70):
```java
String listPath = dir;
if (!dir.endsWith("/")) {
listPath = dir + "/";
}
Iterable<FileInfo> files = io.listPrefix(listPath); // String-based
prefix operation
```
This fix is effective because `listPrefix()` takes a String and does
prefix-based listing.
2. **`listDirRecursivelyWithHadoop`** - Does NOT need the fix because:
- **Hadoop's `Path` class normalizes paths and strips trailing slashes**
- even if we add `/` to the string, `new Path(normalizedDir)` removes it
- **`fs.listStatus()` is a directory operation**, not a prefix operation
- it's inherently scoped to the directory and won't accidentally include files
from `s3://bucket/table-backup` when listing `s3://bucket/table`
## Test Result
My test `testListDirRecursivelyWithHadoopNormalizesPath` confirmed this:
```
Expecting actual: "tracking://bucket/table"
to end with: "/"
```
The Path object stripped the trailing slash, proving the fix is ineffective
for Hadoop paths.
## Conclusion
The issue #16493 is already addressed by the existing trailing slash
normalization in `listDirRecursivelyWithFileIO`. No changes are needed for
`listDirRecursivelyWithHadoop`.
I'm closing this PR. @rdblue @pvary please let me know if you disagree with
this analysis.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]