Baunsgaard opened a new issue, #15172:
URL: https://github.com/apache/iceberg/issues/15172

   ### Apache Iceberg version
   
   1.10.1 (latest release)
   
   ### Query engine
   
   None
   
   ### Please describe the bug 🐞
   
   
   `RewriteTablePathUtil.relativize()` throws `IllegalArgumentException` when 
the path equals the prefix exactly, rather than being a child path under it. 
This breaks `rewrite_table_path` when table properties like 
`write.data.location` point to the table root.
   
   ### Expected behavior
   
   `relativize("/path/to/table", "/path/to/table")` should return `""` (empty 
string representing root).
   
   ### Actual behavior
   
   Throws `IllegalArgumentException: Path /path/to/table does not start with 
/path/to/table/`
   
   ### Root Cause
   
   
[`RewriteTablePathUtil.relativize()`](https://github.com/apache/iceberg/blob/main/core/src/main/java/org/apache/iceberg/RewriteTablePathUtil.java#L758-L765)
 appends `/` to the prefix before checking `startsWith()`:
   
   ```java
   public static String relativize(String path, String prefix) {
     String toRemove = maybeAppendFileSeparator(prefix);  // "/table" -> 
"/table/"
     if (!path.startsWith(toRemove)) {
       throw new IllegalArgumentException(...);  // FAILS when path == prefix
     }
     return path.substring(toRemove.length());
   }
   ```
   
   ### Steps to reproduce
   
   ```java
   String prefix = "/path/to/table";
   
   // Works - path is under prefix
   RewriteTablePathUtil.relativize("/path/to/table/data/file.parquet", prefix); 
 // ✓ returns "data/file.parquet"
   
   // Fails - path equals prefix (e.g., write.data.path = table root)
   RewriteTablePathUtil.relativize("/path/to/table", prefix);  // ✗ throws 
IllegalArgumentException
   ```
   
   ### Use Case
   
   This affects `rewrite_table_path` for tables where `write.data.path` or 
`write.metadata.path` equals the table root:
   
   ```sql
   -- Table with write.data.path set to table root (valid configuration)
   CREATE TABLE catalog.db.events (id BIGINT, data STRING)
   USING iceberg
   LOCATION 's3://bucket/warehouse/db/events'
   TBLPROPERTIES ('write.data.path' = 's3://bucket/warehouse/db/events');
   
   -- Replicating table to DR region:
   CALL catalog.system.rewrite_table_path(
     'db.events',                              -- table
     's3://bucket/warehouse/db/events',        -- source_prefix (table 
location)  
     's3://bucket-dr/warehouse/db/events'      -- target_prefix
   );
   -- Fails when processing write.data.path property in updatePathInProperty():
   -- IllegalArgumentException: Path s3://bucket/warehouse/db/events 
   --   does not start with s3://bucket/warehouse/db/events/
   ```
   
   **Affected scenarios:**
   - **Storage migration**: Moving tables between buckets or storage systems
   - **Backup/restore**: Archiving table metadata to different locations
   
   The `location` field itself works (uses `replaceFirst`), but path properties 
go through `relativize()` which fails on this edge case.
   
   ### Proposed Fix
   
   Handle the case where path equals prefix by checking after normalization:
   
   ```java
   public static String relativize(String path, String prefix) {
     String toRemove = maybeAppendFileSeparator(prefix);
     
     if (path.startsWith(toRemove)) {
       return path.substring(toRemove.length());
     }
     
     // Handle exact match where path equals prefix (without trailing separator)
     if (maybeAppendFileSeparator(path).equals(toRemove)) {
       return "";
     }
     
     throw new IllegalArgumentException(
         String.format("Path %s does not start with %s", path, toRemove));
   }
   ```
   
   I can submit a PR with this fix and tests.
   
   ### Willingness to contribute
   
   - [x] I can contribute a fix for this bug independently
   - [ ] I would be willing to contribute a fix for this bug with guidance from 
the Iceberg community
   - [ ] I cannot contribute a fix for this bug at this time


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to