krisnaru commented on code in PR #14355:
URL: https://github.com/apache/iceberg/pull/14355#discussion_r3186041706


##########
core/src/main/java/org/apache/iceberg/RewriteTablePathUtil.java:
##########
@@ -608,6 +610,43 @@ private static PositionDelete newPositionDeleteRecord(
     return delete;
   }
 
+  /**
+   * Lookup the longest matching prefix mapping for a given path.
+   *
+   * @param path the path to find a prefix mapping for
+   * @param prefixMappings map of source prefix to target prefix mappings
+   * @return the Map.Entry with the longest matching source prefix, or null if 
no match found
+   */
+  public static Map.Entry<String, String> lookupPrefixMappings(
+          String path, Map<String, String> prefixMappings) {
+    if (prefixMappings == null || prefixMappings.isEmpty() || path == null) {
+      return null;
+    }
+
+    // Sort entries by key length in descending order to find the longest 
matching prefix
+    return prefixMappings.entrySet().stream()
+            .filter(entry -> path.startsWith(entry.getKey()))
+            .max(java.util.Comparator.comparing(entry -> 
entry.getKey().length()))
+            .orElse(null);

Review Comment:
    If you have multiple prefix mappings like:                                  
                                                                                
                                                   
                                                                                
                                                                                
                                                  
   ```
     s3://bucket/warehouse/         → s3://new-bucket/warehouse/                
                                                                                
                                                    
     s3://bucket/warehouse/db/tbl/  → s3://other-bucket/data/         
   ```                                                                          
                                                                    
                                                                                
                                                                                
                                                    
     And a file path is s3://bucket/warehouse/db/tbl/data.parquet, both 
prefixes match (both are valid startsWith matches). By picking the longest 
matching prefix, you ensure the most specific mapping wins — the 
     file gets rewritten to s3://other-bucket/data/data.parquet rather than 
s3://new-bucket/warehouse/db/tbl/data.parquet.                                  
                                                        
                                                                                
                                                                                
                                                    
    Without the length-based sorting, the result would be non-deterministic 
(depends on map iteration order) and could apply the wrong, less-specific 
prefix.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to