bolma-lila opened a new pull request, #16656:
URL: https://github.com/apache/iceberg/pull/16656
## Description
`S3FileIO.listPrefix()` currently only normalizes the prefix with a trailing
slash for S3 Directory Buckets (via `toDirectoryPath()`), but not for standard
S3 buckets. This causes the prefix passed to `ListObjectsV2` to potentially
match sibling keys.
## Problem
Given a table location `s3://bucket/warehouse/ns/table`, calling
`listPrefix("s3://bucket/warehouse/ns/table")` sets `prefix=warehouse/ns/table`
in the `ListObjectsV2Request`. S3 returns **all** keys starting with that
string, including:
- `warehouse/ns/table/metadata/...` (correct — belongs to this table)
- `warehouse/ns/table_archive/data/...` (**wrong** — different table with
similar name)
This also breaks STS credential vending from REST catalogs like
[Lakekeeper](https://github.com/lakekeeper/lakekeeper), which scope the session
policy to:
```json
{"Condition": {"StringLike": {"s3:prefix": "warehouse/ns/table/*"}}}
```
A prefix without trailing slash does not match `StringLike` pattern `.../*`,
resulting in `403 AccessDenied`.
## Fix
Apply `toDirectoryPath()` unconditionally in `listPrefix()`, not just for
Directory Buckets. This ensures the key always ends with `/` before being used
as a ListObjects prefix, matching how [Trino normalizes
keys](https://github.com/trinodb/trino/blob/master/lib/trino-filesystem-s3/src/main/java/io/trino/filesystem/s3/S3FileSystem.java#L404-L410)
via `directoryKey()`.
## Testing
- Existing integration test (`TestS3FileIO.testListPrefix`) creates objects
under `prefix/<scale>/` subdirectories, so the normalized prefix
`path/to/list/` still matches all expected objects
- Verified manually with vended STS credentials:
`ListObjectsV2(prefix="key/")` succeeds where `ListObjectsV2(prefix="key")`
returns 403
## Related
- Lakekeeper discussion: https://github.com/lakekeeper/lakekeeper/issues/1795
- Airbyte companion fix: https://github.com/airbytehq/airbyte/pull/78624
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]