jerryshao commented on issue #10444:
URL: https://github.com/apache/gravitino/issues/10444#issuecomment-4073289401
Here is the problem:
The Bug Flow
1. Server restarts → both tableFormatCache (in-memory Guava cache in
GenericCatalogOperations) and EntityCache (in RelationalEntityStore) are
cleared.
2. A request comes in to load a Delta table with access control enabled.
3. Before the main handler runs, MetadataAuthzHelper.filterByExpression()
is called for authorization. It calls preloadToCache(TABLE,
[delta_table_ident]) because TABLE is in SUPPORTED_PRELOAD_ENTITY_TYPES.
4. preloadToCache() calls entityStore().batchGet() →
TableMetaService.batchGetTableByIdentifier() → batchSelectTableByIdentifier SQL.
5. The bug: batchSelectTableByIdentifier in
TableMetaBaseSQLProvider.java:345 only selects from table_meta without a LEFT
JOIN to table_version_info. All other SELECT queries (e.g.,
selectTableMetaBySchemaIdAndName, listTablePOsBySchemaId) do include the JOIN.
This means format and properties come back as null.
6. POConverters.fromTablePOs() builds a TableEntity with an empty
properties map — no PROPERTY_TABLE_FORMAT key.
7. cache.put(entity) stores this incomplete entity in the EntityCache.
8. The main handler calls GenericCatalogOperations.tableOps(ident) →
tableFormatCache miss (cleared after restart) → store.get(tableIdent, TABLE,
TableEntity.class) → EntityCache HIT returns the incomplete entity.
9. table.properties().getOrDefault(PROPERTY_TABLE_FORMAT, null) returns
null.
10. Preconditions.checkArgument(format != null, "Table format for %s is
null...") throws IllegalArgumentException.
Why Only After Restart?
Before restart, tableFormatCache is populated during createTable(). So
tableOps() hits the tableFormatCache and never touches the entity store. The
incomplete entity in EntityCache is irrelevant. After restart, tableFormatCache
is empty, forcing the entity store lookup, which returns the corrupt cached
entity.
Why Only for Generic Lakehouse Catalog?
Only GenericCatalogOperations.tableOps() reads PROPERTY_TABLE_FORMAT from
the entity store at load time. No other catalog type does this.
The Fix
Add the LEFT JOIN with table_version_info to batchSelectTableByIdentifier
in TableMetaBaseSQLProvider.java:345, matching the pattern used by every other
SELECT query in the same class. This ensures entities preloaded into the cache
are complete and include format/properties.
❯ Do we have similar issues for fileset, model, topic?
⏺ Searched for 6 patterns, read 3 files (ctrl+o to expand)
⏺ No, fileset, model, and topic do NOT have this issue. Here's why for each:
Fileset (batchSelectFilesetByIdentifier): Correctly uses INNER JOIN with
fileset_version_info to fetch properties, storage_location, etc. The version
data is included. ✅
Topic (batchSelectTopicByIdentifier): Topic has no separate version table
— comment and properties are stored directly in topic_meta and are selected in
the batch query. ✅
Model (batchSelectModelByIdentifier): Same as Topic — model_comment and
model_properties live directly in model_meta, no separate version table, so no
JOIN is needed. ✅
Table (batchSelectTableByIdentifier): This is the only one affected.
Unlike the others, format and properties live in table_version_info (a separate
version table), but the batch query only selects from table_meta without the
LEFT JOIN. This returns incomplete TablePO objects (null format, null
properties), which corrupt the EntityCache when preloaded by
MetadataAuthzHelper.preloadToCache().
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]