Re: [PR] HIVE-28059: Iceberg REST Catalog [hive]

via GitHub Fri, 15 Nov 2024 09:01:58 -0800


saihemanth-cloudera commented on code in PR #5145:
URL: https://github.com/apache/hive/pull/5145#discussion_r1833393157



##########
hms-catalog/src/test/resources/scripts/metastore/upgrade/derby/hive-schema-4.1.0.derby.sql:
##########
@@ -0,0 +1,870 @@
+-- Timestamp: 2011-09-22 15:32:02.024
+-- Source database is: 
/home/carl/Work/repos/hive1/metastore/scripts/upgrade/derby/mdb
+-- Connection URL is: 
jdbc:derby:/home/carl/Work/repos/hive1/metastore/scripts/upgrade/derby/mdb
+-- Specified schema is: APP
+-- appendLogs: false
+
+-- ----------------------------------------------
+-- DDL Statements for functions
+-- ----------------------------------------------
+
+CREATE FUNCTION "APP"."NUCLEUS_ASCII" (C CHAR(1)) RETURNS INTEGER LANGUAGE 
JAVA PARAMETER STYLE JAVA READS SQL DATA CALLED ON NULL INPUT EXTERNAL NAME 
'org.datanucleus.store.rdbms.adapter.DerbySQLFunction.ascii' ;

Review Comment:
   Is this file used to set up the test framework?
   It is not a good idea to duplicate the scripts as the future changes have to 
be done at multiple places.
   You can setup the script file like this if you are using this to run tests: 
https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/pom.xml#L585-L597



##########
common/pom.xml:
##########
@@ -330,6 +330,10 @@
       <groupId>javolution</groupId>
       <artifactId>javolution</artifactId>
     </dependency>
+    <dependency>
+      <groupId>org.apache.hive</groupId>
+      <artifactId>hive-storage-api</artifactId>

Review Comment:
   Why do we need storage-api as a dependency in the "common" directory?



##########
iceberg/iceberg-catalog/pom.xml:
##########
@@ -32,6 +32,21 @@
       <groupId>org.apache.hive</groupId>
       <artifactId>hive-iceberg-shading</artifactId>
     </dependency>
+    <!-- Intellij dislikes shaded Iceberg: uncomment the following block helps 
when coding/debugging -->
+    <!--
+    <dependency>
+      <groupId>org.apache.iceberg</groupId>
+      <artifactId>iceberg-api</artifactId>
+      <version>${iceberg.version}</version>
+      <optional>true</optional>
+    </dependency>
+    <dependency>
+      <groupId>org.apache.iceberg</groupId>
+      <artifactId>iceberg-core</artifactId>
+      <version>${iceberg.version}</version>
+      <optional>true</optional>
+    </dependency>
+    -->

Review Comment:
   nit: please remove this comment section.



##########
common/src/java/org/apache/hadoop/hive/conf/HiveConf.java:
##########
@@ -2235,6 +2235,10 @@ public static enum ConfVars {
     
HIVE_ICEBERG_ALLOW_DATAFILES_IN_TABLE_LOCATION_ONLY("hive.iceberg.allow.datafiles.in.table.location.only",
 false,
         "If this is set to true, then all the data files being read should be 
withing the table location"),
 
+    HIVE_ICEBERG_CATALOG_ACTOR_CLASS("hive.iceberg.catalog.actor.class", 
"org.apache.iceberg.rest.HMSCatalogActor",
+        "Catalog actor implementation class. Default value is an embedded 
(no-thrift) catalog actor." +
+            "For a thrift-based single-tenant actor, use 
\"org.apache.iceberg.hive.HiveCatalogActor\".."),
+

Review Comment:
   nit: I think "HMSRestCatalogActor" and "HiveThriftCatalogActor" make the 
class more intuitive to understand IMO.



##########
iceberg/iceberg-catalog/src/main/java/org/apache/iceberg/hive/HiveCatalog.java:
##########
@@ -240,19 +240,14 @@ private void renameTableOrView(
     String fromName = from.name();
 
     try {
-      Table table = clients.run(client -> client.getTable(fromDatabase, 
fromName));
-      validateTableIsIcebergTableOrView(contentType, table, 
CatalogUtil.fullTableName(name, from));
+      Table table = actor.getTable(fromDatabase, fromName);
+      validateTableIsIcebergTableOrView(contentType, table, 
TableIdentifier.of(from.namespace(), fromName).name());

Review Comment:
   nit: Why not use CatalogUtil.fullTableName() instead of TableIdentifier.of()?



##########
pom.xml:
##########
@@ -175,6 +175,10 @@
     <!-- Leaving libfb303 at 0.9.3 regardless of libthrift: As per THRIFT-4613 
The Apache Thrift project does not publish items related to fb303 at this point 
-->
     <libfb303.version>0.9.3</libfb303.version>
     <libthrift.version>0.16.0</libthrift.version>
+    <!--
+    <thrift.home>/usr/local</thrift.home>
+    -->

Review Comment:
   nit: remove this comment



##########
iceberg/iceberg-catalog/src/main/java/org/apache/iceberg/hive/HiveTableOperations.java:
##########
@@ -143,11 +149,15 @@ public FileIO io() {
   protected void doRefresh() {
     String metadataLocation = null;
     try {
-      Table table = metaClients.run(client -> client.getTable(database, 
tableName));
-      HiveOperationsBase.validateTableIsIceberg(table, fullName);
-
-      metadataLocation = table.getParameters().get(METADATA_LOCATION_PROP);
-
+      Table table = actor.getTable(database, tableName);
+      if (table != null) {
+        HiveOperationsBase.validateTableIsIceberg(table, fullName);
+        metadataLocation = table.getParameters().get(METADATA_LOCATION_PROP);
+      } else {
+        if (currentMetadataLocation() != null) {
+          throw new NoSuchTableException("No such table: %s.%s", database, 
tableName);
+        }

Review Comment:
   I think we don't need this else block since there is a check in the catch() 
block. 



##########
iceberg/iceberg-catalog/src/main/java/org/apache/iceberg/hive/HiveCatalog.java:
##########
@@ -527,18 +507,26 @@ protected String defaultWarehouseLocation(TableIdentifier 
tableIdentifier) {
   }
 
   private String databaseLocation(String databaseName) {
-    String warehouseLocation = 
conf.get(HiveConf.ConfVars.METASTORE_WAREHOUSE.varname);
-    Preconditions.checkNotNull(
-            warehouseLocation, "Warehouse location is not set: 
hive.metastore.warehouse.dir=null");
+    String warehouseLocation = conf.get("metastore.warehouse.dir");
+    if (warehouseLocation == null) {
+      warehouseLocation = 
conf.get(HiveConf.ConfVars.METASTORE_WAREHOUSE.varname);
+    }
+    Preconditions.checkNotNull(warehouseLocation,
+        "Warehouse location is not set: hive.metastore.warehouse.dir=null");
     warehouseLocation = LocationUtil.stripTrailingSlash(warehouseLocation);
-    return String.format("%s/%s.db", warehouseLocation, databaseName);
+    return String.format("%s/%s.db", warehouseLocation, 
databaseName.toLowerCase());
   }
 
-  private String getExternalWarehouseLocation() {
-    String warehouseLocation = 
conf.get(HiveConf.ConfVars.HIVE_METASTORE_WAREHOUSE_EXTERNAL.varname);
+
+  private String databaseLocationInExternalWarehouse(String databaseName) {
+    String warehouseLocation = conf.get("metastore.warehouse.external.dir");

Review Comment:
   nit: same as above



##########
iceberg/iceberg-catalog/src/main/java/org/apache/iceberg/hive/HiveCatalog.java:
##########
@@ -527,18 +507,26 @@ protected String defaultWarehouseLocation(TableIdentifier 
tableIdentifier) {
   }
 
   private String databaseLocation(String databaseName) {
-    String warehouseLocation = 
conf.get(HiveConf.ConfVars.METASTORE_WAREHOUSE.varname);
-    Preconditions.checkNotNull(
-            warehouseLocation, "Warehouse location is not set: 
hive.metastore.warehouse.dir=null");
+    String warehouseLocation = conf.get("metastore.warehouse.dir");

Review Comment:
   why check 'metastore.warehouse.dir', we should actually be checking 
HiveConf.ConfVars.METASTORE_WAREHOUSE.varname right?



##########
serde/pom.xml:
##########
@@ -253,6 +258,12 @@
       <version>${mockito-core.version}</version>
       <scope>test</scope>
     </dependency>
+    <dependency>
+      <groupId>org.apache.hive</groupId>
+      <artifactId>hive-classification</artifactId>
+      <version>4.1.0-SNAPSHOT</version>
+      <scope>compile</scope>
+    </dependency>

Review Comment:
   Why are these dependencies added in Serde's pom? I don't see it being used 
anywhere.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org

Re: [PR] HIVE-28059: Iceberg REST Catalog [hive]

Reply via email to