mxm commented on code in PR #15630: URL: https://github.com/apache/iceberg/pull/15630#discussion_r3028480832
########## format/spec.md: ########## @@ -123,6 +139,35 @@ Tables do not require random-access writes. Once written, data and metadata file Tables do not require rename, except for tables that use atomic rename to implement the commit operation for new metadata files. +### Paths in Metadata + +Path strings stored in Iceberg metadata files are classified as one of two types: + +* **Absolute path** -- A path string that includes a [URI scheme](https://datatracker.ietf.org/doc/html/rfc3986#section-3.1) (e.g., `s3://`, `gs://`, `hdfs://`, `file:///`). Absolute paths are used as-is without modification. +* **Relative path** -- A path string that does not include a URI scheme. Relative paths must be resolved against the table's base location before use. + +Prior to v4, all path fields must contain absolute paths. Starting with v4, path fields may contain either absolute or relative paths. Directory navigation symbols (`.` and `..`) and other file system conventions are not supported in relative paths. Review Comment: I agree that Iceberg should not normalize paths. Users must ensure they use normalized paths according to the logic of the file system. FileIO should detect and reject any non-normalized paths, at least the ones where the normalized version significantly alters the path (not something like a simple trailing slash). As you pointed out, `s3://bucket/a/b/c` is not a normalization of `s3://bucket/a/b//c`. They are two separate paths (both already normalized actually). However, for most file systems,`file:///files/` is a normalization of `file:///my/../files/`. FileIO should reject the latter path. I think it is worth to point out that absolute paths have the same constraints as relative paths: ```suggestion Prior to v4, all path fields must contain absolute paths. Starting with v4, path fields may contain either absolute or relative paths. Paths (relative or absolute) must not include symbols which the underlying file system interprets as directives (e.g. `.` or `..` in local file systems). It is up to the file system implementation to reject these paths. ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
