stevenzwu commented on code in PR #16174:
URL: https://github.com/apache/iceberg/pull/16174#discussion_r3228764738


##########
core/src/main/java/org/apache/iceberg/util/LocationUtil.java:
##########
@@ -57,4 +57,59 @@ public static String tableLocation(TableIdentifier 
tableIdentifier, boolean useU
       return tableIdentifier.name();
     }
   }
+
+  /**
+   * Returns true if the location contains a URI scheme (e.g. {@code s3:}, 
{@code hdfs:}, {@code
+   * file:}), per <a 
href="https://datatracker.ietf.org/doc/html/rfc3986#section-3.1";>RFC 3986
+   * section 3.1</a>.
+   */
+  private static boolean hasScheme(String location) {
+    if (location.isEmpty()) {
+      return false;
+    }
+
+    // Early termination for relative locations since most commonly start with 
/
+    if (location.charAt(0) == '/') {
+      return false;
+    }
+
+    for (int i = 0; i < location.length(); i += 1) {
+      char ch = location.charAt(i);
+      if (ch == ':') {
+        return i > 0;
+      }
+
+      if (!Character.isLetterOrDigit(ch) && ch != '+' && ch != '-' && ch != 
'.') {
+        return false;
+      }
+    }
+
+    return false;
+  }
+
+  /**
+   * Resolves a relative location against a table location. If the location 
has a URI scheme, it is

Review Comment:
   > Resolves a relative location
   
   This seems inaccurate, since the location can be an absolute. maybe just 
simply drop the `relative` word.



##########
core/src/main/java/org/apache/iceberg/util/LocationUtil.java:
##########
@@ -57,4 +57,59 @@ public static String tableLocation(TableIdentifier 
tableIdentifier, boolean useU
       return tableIdentifier.name();
     }
   }
+
+  /**
+   * Returns true if the location contains a URI scheme (e.g. {@code s3:}, 
{@code hdfs:}, {@code
+   * file:}), per <a 
href="https://datatracker.ietf.org/doc/html/rfc3986#section-3.1";>RFC 3986
+   * section 3.1</a>.
+   */
+  private static boolean hasScheme(String location) {
+    if (location.isEmpty()) {
+      return false;
+    }
+
+    // Early termination for relative locations since most commonly start with 
/
+    if (location.charAt(0) == '/') {
+      return false;
+    }
+
+    for (int i = 0; i < location.length(); i += 1) {
+      char ch = location.charAt(i);
+      if (ch == ':') {
+        return i > 0;
+      }
+
+      if (!Character.isLetterOrDigit(ch) && ch != '+' && ch != '-' && ch != 
'.') {
+        return false;
+      }
+    }
+
+    return false;
+  }
+
+  /**
+   * Resolves a relative location against a table location. If the location 
has a URI scheme, it is
+   * returned as-is. Otherwise, the location is appended to the table location 
without any
+   * additional separator.
+   */
+  public static String resolveLocation(String tableLocation, String location) {
+    if (hasScheme(location)) {
+      return location;
+    }
+
+    return tableLocation + location;
+  }
+
+  /**
+   * Relativizes a location against a table location. If the location starts 
with the table
+   * location, the prefix is removed and the remaining relative portion is 
returned. Otherwise, the
+   * location is returned as-is.
+   */
+  public static String relativizeLocation(String tableLocation, String 
location) {
+    if (location.startsWith(tableLocation)) {

Review Comment:
   Prefix-collision bug: `startsWith` matches sibling locations that share a 
prefix with the table location.
   
   ```
   relativizeLocation("s3://bucket/table", 
"s3://bucket/table_v2/data/00000-0.parquet")
     -> "_v2/data/00000-0.parquet"   // wrong; this location is NOT under 
tableLocation
   ```
   
   The new `testRelativizeLocationNotUnderTableLocation` covers 
different-bucket and different-path cases, but neither shares a prefix with 
`table`, so this case still slips through. Worth a test for the `table` vs 
`table_v2` style collision.
   
   One possible solution
   ```java
     if (location.startsWith(tableLocation)) {
       String rest = location.substring(tableLocation.length());
       if (rest.isEmpty() || rest.startsWith("/")) {
         return rest;
       }
     }
   
     return location;
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to