stevenzwu commented on code in PR #16174:
URL: https://github.com/apache/iceberg/pull/16174#discussion_r3229747156


##########
core/src/main/java/org/apache/iceberg/util/LocationUtil.java:
##########
@@ -57,4 +57,59 @@ public static String tableLocation(TableIdentifier 
tableIdentifier, boolean useU
       return tableIdentifier.name();
     }
   }
+
+  /**
+   * Returns true if the location contains a URI scheme (e.g. {@code s3:}, 
{@code hdfs:}, {@code
+   * file:}), per <a 
href="https://datatracker.ietf.org/doc/html/rfc3986#section-3.1";>RFC 3986
+   * section 3.1</a>.
+   */
+  private static boolean hasScheme(String location) {
+    if (location.isEmpty()) {
+      return false;
+    }
+
+    // Early termination for relative locations since most commonly start with 
/
+    if (location.charAt(0) == '/') {
+      return false;
+    }
+
+    for (int i = 0; i < location.length(); i += 1) {
+      char ch = location.charAt(i);
+      if (ch == ':') {
+        return i > 0;
+      }
+
+      if (!Character.isLetterOrDigit(ch) && ch != '+' && ch != '-' && ch != 
'.') {
+        return false;
+      }
+    }
+
+    return false;
+  }
+
+  /**
+   * Resolves a relative location against a table location. If the location 
has a URI scheme, it is
+   * returned as-is. Otherwise, the location is appended to the table location 
without any
+   * additional separator.
+   */
+  public static String resolveLocation(String tableLocation, String location) {
+    if (hasScheme(location)) {
+      return location;
+    }
+
+    return tableLocation + location;
+  }
+
+  /**
+   * Relativizes a location against a table location. If the location starts 
with the table
+   * location, the prefix is removed and the remaining relative portion is 
returned. Otherwise, the
+   * location is returned as-is.
+   */
+  public static String relativizeLocation(String tableLocation, String 
location) {
+    if (location.startsWith(tableLocation)) {

Review Comment:
   The issue here isn't really about whether `/` is a separator — at the 
storage layer, S3 is a flat keyspace (bucket + opaque object key), and there's 
no folder concept in S3 itself. Requiring path-separator semantics at that 
level isn't right.
   
   The actual problem with byte-prefix `startsWith` is **relocation 
correctness**, which is the whole purpose of supporting relative paths.
   
   Same-location round-trip looks fine in isolation:
   
   ```
   tableLocation = "s3://bucket/table"
   file          = "s3://bucket/table_v2/file"   (a sibling of the table, not a 
child)
   relativize    -> "_v2/file"
   resolve back  -> "s3://bucket/table_v2/file"  ✓ round-trips
   ```
   
   But move the table to `s3://newbucket/newtable` and resolve the same stored 
relative form:
   
   ```
   resolve("s3://newbucket/newtable", "_v2/file")
     -> "s3://newbucket/newtable_v2/file"
   ```
   
   The file that originally lived at `s3://bucket/table_v2/file` (a sibling of 
the old table) now resolves to `s3://newbucket/newtable_v2/file`, with no 
reason that key should exist or contain the right data. A file that isn't 
structurally a child of the table location cannot survive relocation when 
stored as a relative path, and the whole point of relative paths is to make 
relocation safe.
   
   The spec PR's rule is "If a file's absolute path shares a common prefix with 
the table location, the relative portion should be stored. Otherwise, the 
absolute path should be stored." — but it doesn't pin down what "common prefix" 
means. Byte-prefix permits the case above; a boundary/segment-aware notion 
forbids it. The relocation argument points to the latter — otherwise relative 
paths are unsafe for any sibling-prefix layout.
   
   cc @rdblue / @danielcweeks: should `s3://bucket/table_v2/file` be 
relativizable against `s3://bucket/table`? If not, the spec should make that 
explicit, and `relativizeLocation` should refuse to produce a relative form in 
that case.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to