I understand your concern, but I may not have expressed myself clearly—I don't 
intend to tightly couple the catalog with specific engine runtime 
configurations either. What I'm suggesting is a lightweight convention 
mechanism, not deep integration.
My idea is actually quite simple: engines could report just a few boolean flags 
upon connection (e.g.,  supports_iceberg: true/false ), or we could push the 
filtering logic down to the engine side via an SDK. This is less about 
"coupling" and more about a declarative contract.
From an engineering perspective, convention over configuration is generally the 
better path:

Convention (auto-reporting/filtering): The engine declares its capabilities → 
HMS or the SDK automatically masks incompatible metadata. This maintains a 
single source of truth—the physical properties of the table (format, location) 
directly determine its visibility.

Configuration (manual access control): Administrators manually maintain a 
separate set of ACL rules outside of HMS to hide certain tables. This 
essentially creates duplicate definition—the metadata layer already defines 
"this is an Iceberg table," and then the permission layer has to define "this 
engine shouldn't see this Iceberg table." As the number of tables or engines 
scales, this manual synchronization overhead becomes unmanageable.
In other words, I'm not asking HMS to understand "what connectors Spark 3.4 has 
installed." I'm simply suggesting that the physical properties of the metadata 
(the format type) should automatically determine its distribution scope. If HMS 
remains completely agnostic and relies on external permission systems to 
retroactively hide visibility, doesn't that actually increase operational 
complexity?



---- Replied Message ----
| From | Denys Kuzmenko<[email protected]> |
| Date | 03/20/2026 19:12 |
| To | [email protected] |
| Cc | |
| Subject | Re: [Discuss][HIVE-28879] Federated Catalog Support in Apache Hive |
I don’t think tying catalog behavior to engine capabilities is a good 
direction. A catalog should remain engine-agnostic and focus purely on metadata 
management and discovery, not on the execution capabilities of specific query 
engines.

Hive Metastore is intentionally designed as a neutral metadata service. It 
exposes table definitions, while each engine (e.g., Apache Spark, Trino, etc.) 
decides whether it can actually process those tables based on its configured 
connectors or format support. Introducing capability negotiation would 
effectively couple the catalog to specific engines and their runtime 
configuration, which breaks that separation of concerns and makes the catalog 
responsible for execution-layer logic.

If a particular engine does not support a given format or catalog (for example, 
it does not have the appropriate client/connector installed), the cleaner 
solution is access control, not metadata filtering. In practice, permissions 
can simply be removed for users of that engine on catalogs or tables they are 
not expected to query.

Keeping the catalog engine-agnostic preserves interoperability and avoids 
embedding engine-specific behavior into the metadata layer.

Reply via email to