Baunsgaard commented on code in PR #15173:
URL: https://github.com/apache/iceberg/pull/15173#discussion_r2761412742
##########
core/src/test/java/org/apache/iceberg/TestRewriteTablePathUtil.java:
##########
@@ -80,4 +81,157 @@ public void testStagingPathWithNoMiddlePart() {
RewriteTablePathUtil.stagingPath(fileDirectlyUnderPrefix,
sourcePrefix, stagingDir);
assertThat(newMethodResult).isEqualTo("/staging/file.parquet");
}
+
+ @Test
+ public void testRelativize() {
+ // Normal case: path is under prefix
+ assertThat(RewriteTablePathUtil.relativize("/a/b/c",
"/a")).isEqualTo("b/c");
+ assertThat(RewriteTablePathUtil.relativize("/a/b", "/a")).isEqualTo("b");
+
+ // Edge case: path equals prefix exactly (issue #15172)
+ assertThat(RewriteTablePathUtil.relativize("/a", "/a")).isEqualTo("");
+ assertThat(RewriteTablePathUtil.relativize("s3://bucket/warehouse",
"s3://bucket/warehouse"))
+ .isEqualTo("");
+
+ // Trailing separator variations - all combinations should work
+ assertThat(RewriteTablePathUtil.relativize("/a/", "/a")).isEqualTo("");
+ assertThat(RewriteTablePathUtil.relativize("/a/", "/a/")).isEqualTo("");
+ assertThat(RewriteTablePathUtil.relativize("/a", "/a/")).isEqualTo("");
+ }
+
+ @Test
+ public void testRelativizeInvalid() {
+ // Path does not start with prefix
+ assertThatThrownBy(() -> RewriteTablePathUtil.relativize("/other/path",
"/source/table"))
+ .isInstanceOf(IllegalArgumentException.class)
+ .hasMessageContaining("does not start with");
+
+ // Overlapping names: /table-old should NOT match prefix /table
+ assertThatThrownBy(() ->
RewriteTablePathUtil.relativize("/table-old/data", "/table"))
+ .isInstanceOf(IllegalArgumentException.class)
+ .hasMessageContaining("does not start with");
+ }
+
+ @Test
+ public void testNewPath() {
+ // Normal case: path is under prefix
+ assertThat(RewriteTablePathUtil.newPath("/src/data/file.parquet", "/src",
"/tgt"))
+ .isEqualTo("/tgt/data/file.parquet");
+
+ // Trailing separator on path
+ assertThat(RewriteTablePathUtil.newPath("/src/data/", "/src",
"/tgt")).isEqualTo("/tgt/data/");
+
+ // Both path and prefix with trailing separator
+ assertThat(RewriteTablePathUtil.newPath("/src/", "/src/",
"/tgt")).isEqualTo("/tgt/");
+ }
+
+ @Test
+ public void testNewPathEqualsPrefix() {
+ // Issue #15172: path equals prefix (e.g., write.data.path = table
location)
+ // Result has trailing separator since directories are represented with
trailing separators
+ assertThat(RewriteTablePathUtil.newPath("/src", "/src",
"/tgt")).isEqualTo("/tgt/");
Review Comment:
This isn't a behavior change. Previously `newPath("/src", "/src", "/tgt")`
would throw because `relativize()` didn't handle `path == prefix`. Now it
works, and the trailing slash is consistent with how `combinePaths` has always
worked.
Note that the final argument in `newPath` is the `targetPrefix`; it is a
folder, and therefore, always appended with a file separator if not given in
the argument:
The current code:
```java
public static String newPath(String path, String sourcePrefix, String
targetPrefix) {
return combinePaths(targetPrefix, relativize(path, sourcePrefix));
}
/** Combine a base and relative path. */
public static String combinePaths(String absolutePath, String
relativePath) {
String base = maybeAppendFileSeparator(absolutePath); // <- SEE HERE!
return relativePath.isEmpty() ? base : base + relativePath;
}
```
What is confusing is the naming of the arguments, since it does not convey
what is happening. It is indeed not intuitive that `targetPrefix` must be a
folder.
If you want to change this to something more intuitive, you would modify the
return of `combinePaths` to return `absolutePath` if the relative path is empty:
```java
public static String combinePaths(String absolutePath, String
relativePath) {
return relativePath.isEmpty() ? absolutePath :
maybeAppendFileSeparator(absolutePath) + relativePath;
}
```
Would you like this change? (I don't know if it breaks something)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]