Copilot commented on code in PR #9430:
URL: https://github.com/apache/gravitino/pull/9430#discussion_r2601963187
##########
scripts/h2/schema-1.1.0-h2.sql:
##########
@@ -61,6 +63,7 @@ CREATE TABLE IF NOT EXISTS `schema_meta` (
`last_version` INT UNSIGNED NOT NULL DEFAULT 1 COMMENT 'schema last
version',
`deleted_at` BIGINT(20) UNSIGNED NOT NULL DEFAULT 0 COMMENT 'schema
deleted at',
PRIMARY KEY (schema_id),
+ KEY idx_name_da (catalog_name, deleted_at),
Review Comment:
The index is created on the wrong column. It should be `(schema_name,
deleted_at)` but is currently `(catalog_name, deleted_at)`. This will not
provide the intended query optimization for schema lookups.
```suggestion
KEY idx_name_da (schema_name, deleted_at),
```
##########
core/src/main/java/org/apache/gravitino/metalake/MetalakeManager.java:
##########
@@ -115,18 +135,24 @@ public static void checkMetalake(NameIdentifier ident,
EntityStore store)
*/
public static boolean metalakeInUse(EntityStore store, NameIdentifier ident)
throws NoSuchMetalakeException {
- try {
- BaseMetalake metalake = store.get(ident, EntityType.METALAKE,
BaseMetalake.class);
- return (boolean)
- metalake.propertiesMetadata().getOrDefault(metalake.properties(),
PROPERTY_IN_USE);
- } catch (NoSuchEntityException e) {
- LOG.warn("Metalake {} does not exist", ident, e);
- throw new NoSuchMetalakeException(METALAKE_DOES_NOT_EXIST_MSG, ident);
-
- } catch (IOException e) {
- LOG.error("Failed to do store operation", e);
- throw new RuntimeException(e);
- }
+
+ BaseMetalake metalake =
+ metalakeCache.get(
+ ident,
+ key -> {
+ try {
+ return store.get(key, EntityType.METALAKE, BaseMetalake.class);
+ } catch (NoSuchEntityException e) {
+ LOG.warn("Metalake {} does not exist", key, e);
+ throw new RuntimeException(e);
+ } catch (IOException e) {
+ LOG.error("Failed to do store operation", e);
+ throw new RuntimeException(e);
+ }
+ });
Review Comment:
The `metalakeInUse` method catches `NoSuchEntityException` and wraps it in a
`RuntimeException`, but the caller expects a `NoSuchMetalakeException` as
indicated by the method's throws declaration. This breaks the expected
exception contract and may cause issues for callers trying to handle
`NoSuchMetalakeException` specifically.
##########
core/src/main/java/org/apache/gravitino/metalake/MetalakeManager.java:
##########
@@ -84,7 +103,8 @@ public MetalakeManager(EntityStore store, IdGenerator
idGenerator) {
// directly without list/get metalake first.
BaseMetalake[] baseMetalakes = listMetalakes();
for (BaseMetalake baseMetalake : baseMetalakes) {
- loadMetalake(baseMetalake.nameIdentifier());
+ BaseMetalake newBaseMetalake =
loadMetalake(baseMetalake.nameIdentifier());
+ metalakeCache.put(baseMetalake.nameIdentifier(), newBaseMetalake);
}
Review Comment:
The cache is invalidated during `loadMetalake` (lines 106-107), but this
creates a potential race condition. If one thread is loading a metalake and
another is checking if it's in use via `metalakeInUse`, the cached value may be
stale. Consider invalidating the cache before calling `loadMetalake` during
initialization, or ensuring cache consistency is maintained across concurrent
access patterns.
##########
core/src/main/java/org/apache/gravitino/storage/relational/mapper/provider/base/TableMetaBaseSQLProvider.java:
##########
@@ -230,4 +233,61 @@ public String deleteTableMetasByLegacyTimeline(
+ TABLE_NAME
+ " WHERE deleted_at > 0 AND deleted_at < #{legacyTimeline} LIMIT
#{limit}";
}
+
+ public String selectTableByFullQualifiedName(
+ @Param("metalakeName") String metalakeName,
+ @Param("catalogName") String catalogName,
+ @Param("schemaName") String schemaName,
+ @Param("tableName") String tableName) {
+ return """
+ SELECT
+ tm.table_id AS tableId,
+ tm.table_name AS tableName,
+ tm.metalake_id AS metalakeId,
+ tm.catalog_id AS catalogId,
+ tm.schema_id AS schemaId,
+ tm.audit_info AS auditInfo,
+ tm.current_version AS currentVersion,
+ tm.last_version AS lastVersion,
+ tm.deleted_at AS deletedAt,
+ tvi.format AS format,
+ tvi.properties AS properties,
+ tvi.partitioning AS partitions,
+ tvi.sort_orders AS sortOrders,
+ tvi.distribution AS distribution,
+ tvi.indexes AS indexes,
+ tvi.comment AS comment
+ FROM
+ %s mm
+ INNER JOIN
+ %s cm ON mm.metalake_id = cm.metalake_id
+ INNER JOIN
+ %s sm ON cm.catalog_id = sm.catalog_id
+ INNER JOIN
+ %s tm ON sm.schema_id = tm.schema_id
+ LEFT JOIN
+ %s tvi ON tm.table_id = tvi.table_id
+ AND tm.current_version = tvi.version
+ AND tvi.deleted_at = 0
+ WHERE
+ mm.metalake_name = '%s'
+ AND mm.deleted_at = 0
+ AND cm.catalog_name = '%s'
+ AND cm.deleted_at = 0
+ AND sm.schema_name = '%s'
+ AND sm.deleted_at = 0
+ AND tm.table_name = '%s'
+ AND tm.deleted_at = 0;
+ """
+ .formatted(
+ MetalakeMetaMapper.TABLE_NAME,
+ CatalogMetaMapper.TABLE_NAME,
+ SchemaMetaMapper.TABLE_NAME,
+ TABLE_NAME,
+ TableVersionMapper.TABLE_NAME,
+ metalakeName,
+ catalogName,
+ schemaName,
+ tableName);
Review Comment:
SQL injection vulnerability: User input parameters (`metalakeName`,
`catalogName`, `schemaName`, `tableName`) are directly interpolated into the
SQL query using `String.formatted()` instead of using parameterized queries.
This allows malicious input to manipulate the SQL query. Use MyBatis parameter
placeholders (e.g., `#{metalakeName}`) instead of string formatting for the
WHERE clause values.
##########
core/src/main/java/org/apache/gravitino/metalake/MetalakeManager.java:
##########
@@ -165,16 +191,21 @@ public BaseMetalake loadMetalake(NameIdentifier ident)
throws NoSuchMetalakeExce
ident,
LockType.READ,
() -> {
- try {
- BaseMetalake baseMetalake = store.get(ident, EntityType.METALAKE,
BaseMetalake.class);
- return newMetalakeWithResolvedProperties(baseMetalake);
- } catch (NoSuchEntityException e) {
- LOG.warn("Metalake {} does not exist", ident, e);
- throw new NoSuchMetalakeException(METALAKE_DOES_NOT_EXIST_MSG,
ident);
- } catch (IOException ioe) {
- LOG.error("Loading Metalake {} failed due to storage issues",
ident, ioe);
- throw new RuntimeException(ioe);
- }
+ BaseMetalake baseMetalake =
+ metalakeCache.get(
+ ident,
+ key -> {
+ try {
+ return store.get(key, EntityType.METALAKE,
BaseMetalake.class);
+ } catch (NoSuchEntityException e) {
+ LOG.warn("Metalake {} does not exist", key, e);
+ throw new RuntimeException(e);
+ } catch (IOException e) {
+ LOG.error("Failed to do store operation", e);
+ throw new RuntimeException(e);
+ }
+ });
Review Comment:
The cache loading function wraps `NoSuchEntityException` in a
`RuntimeException`, but the caller (`loadMetalake`) expects a
`NoSuchMetalakeException` as indicated by the method's throws declaration. This
breaks the expected exception contract and will cause the wrong exception type
to be thrown to callers.
##########
core/src/main/java/org/apache/gravitino/storage/relational/service/TableMetaService.java:
##########
@@ -342,4 +344,21 @@ private TablePO getTablePOBySchemaIdAndName(Long schemaId,
String tableName) {
}
return tablePO;
}
+
Review Comment:
The new method `getTableByFullQualifiedName` is private and lacks
documentation explaining its purpose, parameters, and when it should be used
versus `getTablePOBySchemaIdAndName`. Add a JavaDoc comment describing this
method's functionality and its relationship to the query optimization changes.
```suggestion
/**
* Retrieves a {@link TablePO} (table metadata object) by its fully
qualified name, consisting of
* metalake, catalog, schema, and table names. This method is used when
the full namespace context
* is available and is preferred for query optimization, as it allows
direct lookup without first
* resolving schema IDs. In contrast, {@link
#getTablePOBySchemaIdAndName(Long, String)} should be
* used when only the schema ID and table name are known.
*
* <p>This method was introduced as part of query optimization changes to
reduce the number of
* database lookups required for table metadata retrieval.
*
* @param metalakeName the name of the metalake
* @param catalogName the name of the catalog
* @param schemaName the name of the schema
* @param tableName the name of the table
* @return the {@link TablePO} corresponding to the fully qualified name
* @throws NoSuchEntityException if no table is found with the specified
fully qualified name
*/
```
##########
scripts/mysql/schema-1.1.0-mysql.sql:
##########
@@ -77,6 +80,7 @@ CREATE TABLE IF NOT EXISTS `table_meta` (
UNIQUE KEY `uk_sid_tn_del` (`schema_id`, `table_name`, `deleted_at`),
KEY `idx_mid` (`metalake_id`),
KEY `idx_cid` (`catalog_id`)
Review Comment:
Missing comma after `KEY \`idx_cid\` (\`catalog_id\`)` on line 82. This
causes a SQL syntax error.
```suggestion
KEY `idx_cid` (`catalog_id`),
```
##########
core/src/main/java/org/apache/gravitino/storage/relational/mapper/provider/base/TableMetaBaseSQLProvider.java:
##########
@@ -230,4 +233,61 @@ public String deleteTableMetasByLegacyTimeline(
+ TABLE_NAME
+ " WHERE deleted_at > 0 AND deleted_at < #{legacyTimeline} LIMIT
#{limit}";
}
+
+ public String selectTableByFullQualifiedName(
+ @Param("metalakeName") String metalakeName,
+ @Param("catalogName") String catalogName,
+ @Param("schemaName") String schemaName,
+ @Param("tableName") String tableName) {
+ return """
+ SELECT
+ tm.table_id AS tableId,
+ tm.table_name AS tableName,
+ tm.metalake_id AS metalakeId,
+ tm.catalog_id AS catalogId,
+ tm.schema_id AS schemaId,
+ tm.audit_info AS auditInfo,
+ tm.current_version AS currentVersion,
+ tm.last_version AS lastVersion,
+ tm.deleted_at AS deletedAt,
+ tvi.format AS format,
+ tvi.properties AS properties,
+ tvi.partitioning AS partitions,
+ tvi.sort_orders AS sortOrders,
+ tvi.distribution AS distribution,
+ tvi.indexes AS indexes,
+ tvi.comment AS comment
+ FROM
+ %s mm
+ INNER JOIN
+ %s cm ON mm.metalake_id = cm.metalake_id
+ INNER JOIN
+ %s sm ON cm.catalog_id = sm.catalog_id
+ INNER JOIN
+ %s tm ON sm.schema_id = tm.schema_id
+ LEFT JOIN
+ %s tvi ON tm.table_id = tvi.table_id
+ AND tm.current_version = tvi.version
+ AND tvi.deleted_at = 0
+ WHERE
+ mm.metalake_name = '%s'
+ AND mm.deleted_at = 0
+ AND cm.catalog_name = '%s'
+ AND cm.deleted_at = 0
+ AND sm.schema_name = '%s'
+ AND sm.deleted_at = 0
Review Comment:
[nitpick] The query performs multiple INNER JOINs without leveraging foreign
key relationships or additional join conditions on `deleted_at`. Consider
adding `AND mm.deleted_at = 0`, `AND cm.deleted_at = 0`, `AND sm.deleted_at =
0` to the respective JOIN conditions (not just the WHERE clause) to allow the
query optimizer to filter rows earlier and potentially use the newly added
indexes more efficiently.
```suggestion
%s cm ON mm.metalake_id = cm.metalake_id AND mm.deleted_at = 0
INNER JOIN
%s sm ON cm.catalog_id = sm.catalog_id AND cm.deleted_at = 0
INNER JOIN
%s tm ON sm.schema_id = tm.schema_id AND sm.deleted_at = 0
LEFT JOIN
%s tvi ON tm.table_id = tvi.table_id
AND tm.current_version = tvi.version
AND tvi.deleted_at = 0
WHERE
mm.metalake_name = '%s'
AND cm.catalog_name = '%s'
AND sm.schema_name = '%s'
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]