mxm commented on code in PR #15630:
URL: https://github.com/apache/iceberg/pull/15630#discussion_r3087627568


##########
format/spec.md:
##########
@@ -123,6 +139,35 @@ Tables do not require random-access writes. Once written, 
data and metadata file
 
 Tables do not require rename, except for tables that use atomic rename to 
implement the commit operation for new metadata files.
 
+### Paths in Metadata
+
+Path strings stored in Iceberg metadata files are classified as one of two 
types:
+
+* **Absolute path** -- A path string that includes a [URI 
scheme](https://datatracker.ietf.org/doc/html/rfc3986#section-3.1) (e.g., 
`s3://`, `gs://`, `hdfs://`, `file:///`). Absolute paths are used as-is without 
modification.
+* **Relative path** -- A path string that does not include a URI scheme. 
Relative paths must be resolved against the table's base location before use.
+
+Prior to v4, all path fields must contain absolute paths. Starting with v4, 
path fields may contain either absolute or relative paths. Directory navigation 
symbols (`.` and `..`) and other file system conventions are not supported in 
relative paths.

Review Comment:
   @danielcweeks A couple of points to clarify:
   
   1. Absolute paths today can already contain file system directives. 
   2. If we want to state in the spec that file system directives should not be 
used, we might want to state that for both relative and absolute paths. 
Currently, we are just making that statement for relative paths.
   3. I'm not suggesting to reject file system directives; we can't, the table 
metadata merely stores strings.
   4. It is up to the FileIO implementation to reject directives (or not).
     a. In the case of S3, it would be wrong to reject ".." or similar because 
they are legitimate path elements.
     b. In the case of other file systems, they are free to decide.
   
   I tried to convey that in the suggested change above. Refining here:
   
   ```suggestion
   Prior to v4, all path fields must contain absolute paths. Starting with v4, 
path fields may contain either absolute or relative paths. Paths (relative or 
absolute) must not include symbols which the underlying file system interprets 
as directives (e.g. `..` in file systems which treat this as the parent 
directory). It is up to the file system implementation to validate these paths.
   ```
   
   Does that make sense?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to