jerryshao commented on issue #10444:
URL: https://github.com/apache/gravitino/issues/10444#issuecomment-4073289401

   Here is the problem:
   
    The Bug Flow
   
     1. Server restarts → both tableFormatCache (in-memory Guava cache in 
GenericCatalogOperations) and EntityCache (in RelationalEntityStore) are 
cleared.
     2. A request comes in to load a Delta table with access control enabled.
     3. Before the main handler runs, MetadataAuthzHelper.filterByExpression() 
is called for authorization. It calls preloadToCache(TABLE, 
[delta_table_ident]) because TABLE is in SUPPORTED_PRELOAD_ENTITY_TYPES.
     4. preloadToCache() calls entityStore().batchGet() → 
TableMetaService.batchGetTableByIdentifier() → batchSelectTableByIdentifier SQL.
     5. The bug: batchSelectTableByIdentifier in 
TableMetaBaseSQLProvider.java:345 only selects from table_meta without a LEFT 
JOIN to table_version_info. All other SELECT queries (e.g., 
selectTableMetaBySchemaIdAndName, listTablePOsBySchemaId) do include the JOIN. 
This means format and properties come back as null.
     6. POConverters.fromTablePOs() builds a TableEntity with an empty 
properties map — no PROPERTY_TABLE_FORMAT key.
     7. cache.put(entity) stores this incomplete entity in the EntityCache.
     8. The main handler calls GenericCatalogOperations.tableOps(ident) → 
tableFormatCache miss (cleared after restart) → store.get(tableIdent, TABLE, 
TableEntity.class) → EntityCache HIT returns the incomplete entity.
     9. table.properties().getOrDefault(PROPERTY_TABLE_FORMAT, null) returns 
null.
     10. Preconditions.checkArgument(format != null, "Table format for %s is 
null...") throws IllegalArgumentException.
   
     Why Only After Restart?
   
     Before restart, tableFormatCache is populated during createTable(). So 
tableOps() hits the tableFormatCache and never touches the entity store. The 
incomplete entity in EntityCache is irrelevant. After restart, tableFormatCache 
is empty, forcing the entity store lookup, which returns the corrupt cached 
entity.
   
     Why Only for Generic Lakehouse Catalog?
   
     Only GenericCatalogOperations.tableOps() reads PROPERTY_TABLE_FORMAT from 
the entity store at load time. No other catalog type does this.
   
     The Fix
   
     Add the LEFT JOIN with table_version_info to batchSelectTableByIdentifier 
in TableMetaBaseSQLProvider.java:345, matching the pattern used by every other 
SELECT query in the same class. This ensures entities preloaded into the cache 
are complete and include format/properties.
   
   ❯ Do we have similar issues for fileset, model, topic?
   
   ⏺ Searched for 6 patterns, read 3 files (ctrl+o to expand)
   
   ⏺ No, fileset, model, and topic do NOT have this issue. Here's why for each:
   
     Fileset (batchSelectFilesetByIdentifier): Correctly uses INNER JOIN with 
fileset_version_info to fetch properties, storage_location, etc. The version 
data is included. ✅
   
     Topic (batchSelectTopicByIdentifier): Topic has no separate version table 
— comment and properties are stored directly in topic_meta and are selected in 
the batch query. ✅
   
     Model (batchSelectModelByIdentifier): Same as Topic — model_comment and 
model_properties live directly in model_meta, no separate version table, so no 
JOIN is needed. ✅
   
     Table (batchSelectTableByIdentifier): This is the only one affected. 
Unlike the others, format and properties live in table_version_info (a separate 
version table), but the batch query only selects from table_meta without the 
LEFT JOIN. This returns incomplete TablePO objects (null format, null 
properties), which corrupt the EntityCache when preloaded by
     MetadataAuthzHelper.preloadToCache().


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to