slfan1989 commented on PR #13882: URL: https://github.com/apache/iceberg/pull/13882#issuecomment-3209649471
> > which caused the path prefix validation to fail > > can you elaborate which path prefix validation you mean here? I think it would be good to have a test here as well to show what was causing the issue that required this fix > > > while our unit tests consistently used toURI().toString() > > I do actually see quite a lot of other places in tests that use `getAbsolutePath()`, such as https://github.com/apache/iceberg/blob/main/core/src/test/java/org/apache/iceberg/rest/TestRESTCatalog.java#L124. > > I'm not saying that this is a good/bad change, I'm just trying to understand why this is all of a sudden needed now @nastra Many thanks for your comment! In #13837, I added a new unit test to verify that the two newly introduced metrics, `rewrittenDeleteFilesCount` and `rewrittenManifestFilesCount`, are greater than 0. This unit test was added to `TestRewriteTablePathProcedure` and is executed across multiple catalogs: `HIVE`, `HADOOP`, `SPARK_SESSION`, and `REST`. The test passes normally under `HIVE`, `HADOOP`, and `SPARK_SESSION`, but it fails under the `REST` catalog with the following exception: ``` Path file:/var/folders/2k/21gv5vmx6z7dlr1g0_jtm8r80000ks/T/iceberg_warehouse10689209309257803697/iceberg_data/default/table/data/deletes.parquet does not start with /var/folders/2k/21gv5vmx6z7dlr1g0_jtm8r80000ks/T/iceberg_warehouse10689209309257803697/iceberg_data/default/table/ java.lang.IllegalArgumentException: Path file:/var/folders/2k/21gv5vmx6z7dlr1g0_jtm8r80000ks/T/iceberg_warehouse10689209309257803697/iceberg_data/default/table/data/deletes.parquet does not start with /var/folders/2k/21gv5vmx6z7dlr1g0_jtm8r80000ks/T/iceberg_warehouse10689209309257803697/iceberg_data/default/table/ at org.apache.iceberg.RewriteTablePathUtil.relativize(RewriteTablePathUtil.java:686) at org.apache.iceberg.RewriteTablePathUtil.newPath(RewriteTablePathUtil.java:663) at org.apache.iceberg.RewriteTablePathUtil.writeDeleteFileEntry(RewriteTablePathUtil.java:492) at org.apache.iceberg.RewriteTablePathUtil.lambda$rewriteDeleteManifest$5(RewriteTablePathUtil.java:438) at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:197) at java.base/java.util.Iterator.forEachRemaining(Iterator.java:133) at java.base/java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1845) at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:509) ``` The root cause of this issue lies in the **different ways the base Hive warehouse path is generated in the test environment**: ``` warehouseLocation = new File(tmp, "iceberg_data").getAbsolutePath(); ``` This approach produces an absolute path without the file: prefix, for example: ``` /var/folders/.../iceberg_data/default/table/ ``` In contrast, most other unit tests use the following: ``` String location = targetTableDir.toFile().toURI().toString(); ``` This method generates a path with the file: prefix, for example: ``` file:/var/folders/.../iceberg_data/default/table/ ``` <img width="1822" height="707" alt="image" src="https://github.com/user-attachments/assets/7cdd0d89-ca29-4446-b764-76f57a11e17e" /> From my perspective, I also believe that using .toURI().toString() provides better compatibility. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
