wombatu-kun opened a new pull request, #16498: URL: https://github.com/apache/iceberg/pull/16498
## Summary Closes #16493. DeleteOrphanFilesSparkAction.filteredCompareToFileList() previously scoped a user-supplied compareToFileList to the action's location field using a raw files.col(FILE_PATH).startsWith(location) filter. When location lacks a trailing path separator — the production-typical shape for storage URIs like s3://bucket/table returned by Table.location() — that filter also accepts sibling paths such as s3://bucket/table-backup/.... Files in those sibling directories then entered the orphan candidate set and could be deleted. This PR normalizes the prefix to directory form via `LocationUtil.stripTrailingSlash(location) + "/"` before the startsWith filter. The same `+ "/"` shape is already used in SnapshotTableSparkAction (lines 131-132) to prevent identical sibling-prefix collisions, so this aligns the orphan-files action with that existing precedent. The fix is applied symmetrically to all three currently supported Spark version trees (v3.5, v4.0, v4.1) — their source files were byte-identical for this method, so the patch is mechanical. The directory-listing path (listedFileDS()) is unaffected: it uses Hadoop's FileSystem.listStatus from a single root, which is inherently bounded to that directory. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
