slfan1989 commented on PR #13882:
URL: https://github.com/apache/iceberg/pull/13882#issuecomment-3231885425

   After reviewing the latest survey results, I recommend using 
`.toURI().toString()` for the replacement here.
   
   Upon analyzing all implementations of `CatalogTestBase`, I found that it 
currently supports four Catalog types: `HIVE`, `HADOOP`, `SPARK_SESSION`, and 
`REST`. Among them, the default paths for Hive tables in `HIVE`, `HADOOP`, and 
`SPARK_SESSION` Catalogs are all in URI format, as detailed below:
   
   ```
   
file:/var/folders/2k/21gv5vmx6z7dlr1g0_jtm8r80000ks/T/hive12918907709677177324/table
   ```
   
   **When creating a table, the BaseMetastoreCatalog#create method is invoked, 
but different Catalog types use their own implementations.**
   
   
https://github.com/apache/iceberg/blob/28555ad8fbad77a4067b6ee2afbdea15428dea26/core/src/main/java/org/apache/iceberg/BaseMetastoreCatalog.java#L189-L200
   
   The key code segment is:
   
   ```
   String baseLocation = location != null ? location : 
defaultWarehouseLocation(identifier);
   ```
   
   > HIVE Catalog
   
   For the HIVE Catalog, the `HiveCatalog` is invoked during table creation. 
`HiveCatalog` retrieves the table's location via the HMS interface, which 
returns a path in URI format.
   
   
https://github.com/apache/iceberg/blob/28555ad8fbad77a4067b6ee2afbdea15428dea26/hive-metastore/src/main/java/org/apache/iceberg/hive/HiveCatalog.java#L698-L718
   
   The results of the local variables are as follows:
   
   <img width="1890" height="984" alt="image" 
src="https://github.com/user-attachments/assets/bec7ca5c-161f-4409-a390-4852265a7e00";
 />
   
   The path is displayed in the form of a URI.
   
   > HADOOP Catalog
   
   When creating a table, HADOOP Catalog uses HadoopCatalog and appends a 
'file:', which is essentially in the form of a URI.
   
   TestBaseWithCatalog#configureValidationCatalog
   
   
https://github.com/apache/iceberg/blob/28555ad8fbad77a4067b6ee2afbdea15428dea26/spark/v4.0/spark/src/test/java/org/apache/iceberg/spark/TestBaseWithCatalog.java#L123-L125
   
   <img width="1890" height="984" alt="image" 
src="https://github.com/user-attachments/assets/266341b8-cc4e-4c13-93e7-8e5a466d9c89";
 />
   
   The path is displayed in the form of a URI.
   
   > SPARK SESSION Catalog
   
   When creating a table, SPARK_SESSION also uses HiveCatalog, so I won't 
repeat the related information.
   
   The path is displayed in the form of a URI.
   
   > REST CataLog
   
   REST Catalog uses JDBC Catalog, and the relevant code initializes the base 
path in RESTCatalogServer.
   
   
https://github.com/apache/iceberg/blob/28555ad8fbad77a4067b6ee2afbdea15428dea26/open-api/src/testFixtures/java/org/apache/iceberg/rest/RESTCatalogServer.java#L89-L93
   
   <img width="1918" height="1053" alt="image" 
src="https://github.com/user-attachments/assets/4717d3a8-8a50-45e7-80c1-a4df47884a0c";
 />
   
   This is displayed in the form of an absolute path, not a URI.
   
   
   So I still believe that the `REST Catalog` should maintain consistency with 
`HiveCatalog`, `HadoopCatalog`, and `SparkSessionCatalog` by using the `URI` 
format to represent the location of the database.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to