Copilot commented on code in PR #9430:
URL: https://github.com/apache/gravitino/pull/9430#discussion_r2601963187


##########
scripts/h2/schema-1.1.0-h2.sql:
##########
@@ -61,6 +63,7 @@ CREATE TABLE IF NOT EXISTS `schema_meta` (
     `last_version` INT UNSIGNED NOT NULL DEFAULT 1 COMMENT 'schema last 
version',
     `deleted_at` BIGINT(20) UNSIGNED NOT NULL DEFAULT 0 COMMENT 'schema 
deleted at',
     PRIMARY KEY (schema_id),
+    KEY idx_name_da (catalog_name, deleted_at),

Review Comment:
   The index is created on the wrong column. It should be `(schema_name, 
deleted_at)` but is currently `(catalog_name, deleted_at)`. This will not 
provide the intended query optimization for schema lookups.
   ```suggestion
       KEY idx_name_da (schema_name, deleted_at),
   ```



##########
core/src/main/java/org/apache/gravitino/metalake/MetalakeManager.java:
##########
@@ -115,18 +135,24 @@ public static void checkMetalake(NameIdentifier ident, 
EntityStore store)
    */
   public static boolean metalakeInUse(EntityStore store, NameIdentifier ident)
       throws NoSuchMetalakeException {
-    try {
-      BaseMetalake metalake = store.get(ident, EntityType.METALAKE, 
BaseMetalake.class);
-      return (boolean)
-          metalake.propertiesMetadata().getOrDefault(metalake.properties(), 
PROPERTY_IN_USE);
-    } catch (NoSuchEntityException e) {
-      LOG.warn("Metalake {} does not exist", ident, e);
-      throw new NoSuchMetalakeException(METALAKE_DOES_NOT_EXIST_MSG, ident);
-
-    } catch (IOException e) {
-      LOG.error("Failed to do store operation", e);
-      throw new RuntimeException(e);
-    }
+
+    BaseMetalake metalake =
+        metalakeCache.get(
+            ident,
+            key -> {
+              try {
+                return store.get(key, EntityType.METALAKE, BaseMetalake.class);
+              } catch (NoSuchEntityException e) {
+                LOG.warn("Metalake {} does not exist", key, e);
+                throw new RuntimeException(e);
+              } catch (IOException e) {
+                LOG.error("Failed to do store operation", e);
+                throw new RuntimeException(e);
+              }
+            });

Review Comment:
   The `metalakeInUse` method catches `NoSuchEntityException` and wraps it in a 
`RuntimeException`, but the caller expects a `NoSuchMetalakeException` as 
indicated by the method's throws declaration. This breaks the expected 
exception contract and may cause issues for callers trying to handle 
`NoSuchMetalakeException` specifically.



##########
core/src/main/java/org/apache/gravitino/metalake/MetalakeManager.java:
##########
@@ -84,7 +103,8 @@ public MetalakeManager(EntityStore store, IdGenerator 
idGenerator) {
     // directly without list/get metalake first.
     BaseMetalake[] baseMetalakes = listMetalakes();
     for (BaseMetalake baseMetalake : baseMetalakes) {
-      loadMetalake(baseMetalake.nameIdentifier());
+      BaseMetalake newBaseMetalake = 
loadMetalake(baseMetalake.nameIdentifier());
+      metalakeCache.put(baseMetalake.nameIdentifier(), newBaseMetalake);
     }

Review Comment:
   The cache is invalidated during `loadMetalake` (lines 106-107), but this 
creates a potential race condition. If one thread is loading a metalake and 
another is checking if it's in use via `metalakeInUse`, the cached value may be 
stale. Consider invalidating the cache before calling `loadMetalake` during 
initialization, or ensuring cache consistency is maintained across concurrent 
access patterns.



##########
core/src/main/java/org/apache/gravitino/storage/relational/mapper/provider/base/TableMetaBaseSQLProvider.java:
##########
@@ -230,4 +233,61 @@ public String deleteTableMetasByLegacyTimeline(
         + TABLE_NAME
         + " WHERE deleted_at > 0 AND deleted_at < #{legacyTimeline} LIMIT 
#{limit}";
   }
+
+  public String selectTableByFullQualifiedName(
+      @Param("metalakeName") String metalakeName,
+      @Param("catalogName") String catalogName,
+      @Param("schemaName") String schemaName,
+      @Param("tableName") String tableName) {
+    return """
+            SELECT
+                tm.table_id AS tableId,
+                tm.table_name AS tableName,
+                tm.metalake_id AS metalakeId,
+                tm.catalog_id AS catalogId,
+                tm.schema_id AS schemaId,
+                tm.audit_info AS auditInfo,
+                tm.current_version AS currentVersion,
+                tm.last_version AS lastVersion,
+                tm.deleted_at AS deletedAt,
+                tvi.format AS format,
+                tvi.properties AS properties,
+                tvi.partitioning AS partitions,
+                tvi.sort_orders AS sortOrders,
+                tvi.distribution AS distribution,
+                tvi.indexes AS indexes,
+                tvi.comment AS comment
+            FROM
+                %s mm
+            INNER JOIN
+                %s cm ON mm.metalake_id = cm.metalake_id
+            INNER JOIN
+                %s sm ON cm.catalog_id = sm.catalog_id
+            INNER JOIN
+                %s tm ON sm.schema_id = tm.schema_id
+            LEFT JOIN
+                %s tvi ON tm.table_id = tvi.table_id
+                AND tm.current_version = tvi.version
+                AND tvi.deleted_at = 0
+            WHERE
+                mm.metalake_name = '%s'
+                AND mm.deleted_at = 0
+                AND cm.catalog_name = '%s'
+                AND cm.deleted_at = 0
+                AND sm.schema_name = '%s'
+                AND sm.deleted_at = 0
+                AND tm.table_name = '%s'
+                AND tm.deleted_at = 0;
+            """
+        .formatted(
+            MetalakeMetaMapper.TABLE_NAME,
+            CatalogMetaMapper.TABLE_NAME,
+            SchemaMetaMapper.TABLE_NAME,
+            TABLE_NAME,
+            TableVersionMapper.TABLE_NAME,
+            metalakeName,
+            catalogName,
+            schemaName,
+            tableName);

Review Comment:
   SQL injection vulnerability: User input parameters (`metalakeName`, 
`catalogName`, `schemaName`, `tableName`) are directly interpolated into the 
SQL query using `String.formatted()` instead of using parameterized queries. 
This allows malicious input to manipulate the SQL query. Use MyBatis parameter 
placeholders (e.g., `#{metalakeName}`) instead of string formatting for the 
WHERE clause values.



##########
core/src/main/java/org/apache/gravitino/metalake/MetalakeManager.java:
##########
@@ -165,16 +191,21 @@ public BaseMetalake loadMetalake(NameIdentifier ident) 
throws NoSuchMetalakeExce
         ident,
         LockType.READ,
         () -> {
-          try {
-            BaseMetalake baseMetalake = store.get(ident, EntityType.METALAKE, 
BaseMetalake.class);
-            return newMetalakeWithResolvedProperties(baseMetalake);
-          } catch (NoSuchEntityException e) {
-            LOG.warn("Metalake {} does not exist", ident, e);
-            throw new NoSuchMetalakeException(METALAKE_DOES_NOT_EXIST_MSG, 
ident);
-          } catch (IOException ioe) {
-            LOG.error("Loading Metalake {} failed due to storage issues", 
ident, ioe);
-            throw new RuntimeException(ioe);
-          }
+          BaseMetalake baseMetalake =
+              metalakeCache.get(
+                  ident,
+                  key -> {
+                    try {
+                      return store.get(key, EntityType.METALAKE, 
BaseMetalake.class);
+                    } catch (NoSuchEntityException e) {
+                      LOG.warn("Metalake {} does not exist", key, e);
+                      throw new RuntimeException(e);
+                    } catch (IOException e) {
+                      LOG.error("Failed to do store operation", e);
+                      throw new RuntimeException(e);
+                    }
+                  });

Review Comment:
   The cache loading function wraps `NoSuchEntityException` in a 
`RuntimeException`, but the caller (`loadMetalake`) expects a 
`NoSuchMetalakeException` as indicated by the method's throws declaration. This 
breaks the expected exception contract and will cause the wrong exception type 
to be thrown to callers.



##########
core/src/main/java/org/apache/gravitino/storage/relational/service/TableMetaService.java:
##########
@@ -342,4 +344,21 @@ private TablePO getTablePOBySchemaIdAndName(Long schemaId, 
String tableName) {
     }
     return tablePO;
   }
+

Review Comment:
   The new method `getTableByFullQualifiedName` is private and lacks 
documentation explaining its purpose, parameters, and when it should be used 
versus `getTablePOBySchemaIdAndName`. Add a JavaDoc comment describing this 
method's functionality and its relationship to the query optimization changes.
   ```suggestion
   
     /**
      * Retrieves a {@link TablePO} (table metadata object) by its fully 
qualified name, consisting of
      * metalake, catalog, schema, and table names. This method is used when 
the full namespace context
      * is available and is preferred for query optimization, as it allows 
direct lookup without first
      * resolving schema IDs. In contrast, {@link 
#getTablePOBySchemaIdAndName(Long, String)} should be
      * used when only the schema ID and table name are known.
      *
      * <p>This method was introduced as part of query optimization changes to 
reduce the number of
      * database lookups required for table metadata retrieval.
      *
      * @param metalakeName the name of the metalake
      * @param catalogName the name of the catalog
      * @param schemaName the name of the schema
      * @param tableName the name of the table
      * @return the {@link TablePO} corresponding to the fully qualified name
      * @throws NoSuchEntityException if no table is found with the specified 
fully qualified name
      */
   ```



##########
scripts/mysql/schema-1.1.0-mysql.sql:
##########
@@ -77,6 +80,7 @@ CREATE TABLE IF NOT EXISTS `table_meta` (
     UNIQUE KEY `uk_sid_tn_del` (`schema_id`, `table_name`, `deleted_at`),
     KEY `idx_mid` (`metalake_id`),
     KEY `idx_cid` (`catalog_id`)

Review Comment:
   Missing comma after `KEY \`idx_cid\` (\`catalog_id\`)` on line 82. This 
causes a SQL syntax error.
   ```suggestion
       KEY `idx_cid` (`catalog_id`),
   ```



##########
core/src/main/java/org/apache/gravitino/storage/relational/mapper/provider/base/TableMetaBaseSQLProvider.java:
##########
@@ -230,4 +233,61 @@ public String deleteTableMetasByLegacyTimeline(
         + TABLE_NAME
         + " WHERE deleted_at > 0 AND deleted_at < #{legacyTimeline} LIMIT 
#{limit}";
   }
+
+  public String selectTableByFullQualifiedName(
+      @Param("metalakeName") String metalakeName,
+      @Param("catalogName") String catalogName,
+      @Param("schemaName") String schemaName,
+      @Param("tableName") String tableName) {
+    return """
+            SELECT
+                tm.table_id AS tableId,
+                tm.table_name AS tableName,
+                tm.metalake_id AS metalakeId,
+                tm.catalog_id AS catalogId,
+                tm.schema_id AS schemaId,
+                tm.audit_info AS auditInfo,
+                tm.current_version AS currentVersion,
+                tm.last_version AS lastVersion,
+                tm.deleted_at AS deletedAt,
+                tvi.format AS format,
+                tvi.properties AS properties,
+                tvi.partitioning AS partitions,
+                tvi.sort_orders AS sortOrders,
+                tvi.distribution AS distribution,
+                tvi.indexes AS indexes,
+                tvi.comment AS comment
+            FROM
+                %s mm
+            INNER JOIN
+                %s cm ON mm.metalake_id = cm.metalake_id
+            INNER JOIN
+                %s sm ON cm.catalog_id = sm.catalog_id
+            INNER JOIN
+                %s tm ON sm.schema_id = tm.schema_id
+            LEFT JOIN
+                %s tvi ON tm.table_id = tvi.table_id
+                AND tm.current_version = tvi.version
+                AND tvi.deleted_at = 0
+            WHERE
+                mm.metalake_name = '%s'
+                AND mm.deleted_at = 0
+                AND cm.catalog_name = '%s'
+                AND cm.deleted_at = 0
+                AND sm.schema_name = '%s'
+                AND sm.deleted_at = 0

Review Comment:
   [nitpick] The query performs multiple INNER JOINs without leveraging foreign 
key relationships or additional join conditions on `deleted_at`. Consider 
adding `AND mm.deleted_at = 0`, `AND cm.deleted_at = 0`, `AND sm.deleted_at = 
0` to the respective JOIN conditions (not just the WHERE clause) to allow the 
query optimizer to filter rows earlier and potentially use the newly added 
indexes more efficiently.
   ```suggestion
                   %s cm ON mm.metalake_id = cm.metalake_id AND mm.deleted_at = 0
               INNER JOIN
                   %s sm ON cm.catalog_id = sm.catalog_id AND cm.deleted_at = 0
               INNER JOIN
                   %s tm ON sm.schema_id = tm.schema_id AND sm.deleted_at = 0
               LEFT JOIN
                   %s tvi ON tm.table_id = tvi.table_id
                   AND tm.current_version = tvi.version
                   AND tvi.deleted_at = 0
               WHERE
                   mm.metalake_name = '%s'
                   AND cm.catalog_name = '%s'
                   AND sm.schema_name = '%s'
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to