mxm commented on code in PR #15630:
URL: https://github.com/apache/iceberg/pull/15630#discussion_r3022410921


##########
format/spec.md:
##########
@@ -123,6 +139,35 @@ Tables do not require random-access writes. Once written, 
data and metadata file
 
 Tables do not require rename, except for tables that use atomic rename to 
implement the commit operation for new metadata files.
 
+### Paths in Metadata
+
+Path strings stored in Iceberg metadata files are classified as one of two 
types:
+
+* **Absolute path** -- A path string that includes a [URI 
scheme](https://datatracker.ietf.org/doc/html/rfc3986#section-3.1) (e.g., 
`s3://`, `gs://`, `hdfs://`, `file:///`). Absolute paths are used as-is without 
modification.
+* **Relative path** -- A path string that does not include a URI scheme. 
Relative paths must be resolved against the table's base location before use.
+
+Prior to v4, all path fields must contain absolute paths. Starting with v4, 
path fields may contain either absolute or relative paths. Directory navigation 
symbols (`.` and `..`) and other file system conventions are not supported in 
relative paths.

Review Comment:
   This issue applies to paths in general, not only to the new relative paths. 
Today, absolute paths could contain "directory navigation symbols", if the 
corresponding FileIO treats them like it. 
   
   I think we could state that paths (relative or absolute) must be normalized:
   
   ```suggestion
   Prior to v4, all path fields must contain absolute paths. Starting with v4, 
path fields may contain either absolute or relative paths. All paths must be 
normalized, i.e. special symbols (`.`, `..`, etc.) must be resolved before 
storing the path.
   ```
   
   +1 for leaving it up to the FileIO to enforce this.



##########
format/spec.md:
##########
@@ -1637,6 +1683,31 @@ The binary single-value serialization can be used to 
store the lower and upper b
 
 ## Appendix E: Format version changes
 
+### Version 4
+
+Relative path support is added in v4.
+
+Reading v3 metadata for v4:
+
+* All path fields are treated as absolute paths
+* Table metadata field `location` is always present
+
+Writing v4 metadata:
+
+* Table metadata JSON:
+    * `location` is now optional
+    * When present, `location` must be an absolute path
+    * When not present, the table location must be managed externally and 
provided when loading the metadata
+* Path fields in all metadata structures may contain relative paths
+* Writers should produce relative paths by default for files that reside under 
the table location
+* Absolute paths must be used for files that do not share a common prefix with 
the table location
+
+Reading v4 metadata:
+
+* Readers must check whether path fields contain a URI scheme to determine if 
a path is absolute or relative
+* Relative paths must be resolved against the table location before use (see 
[Path Resolution](#path-resolution))
+* When `location` is omitted, the table location must be provided (see [Table 
Location Specification](#table-location-specification)

Review Comment:
   ```suggestion
   * When `location` is omitted, the table location must be provided (see 
[Table Location Specification](#table-location-specification))
   ```
   
   NIT missing parenthesis



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to