I understand your concern, but I may not have expressed myself clearly—I don't intend to tightly couple the catalog with specific engine runtime configurations either. What I'm suggesting is a lightweight convention mechanism, not deep integration. My idea is actually quite simple: engines could report just a few boolean flags upon connection (e.g., supports_iceberg: true/false ), or we could push the filtering logic down to the engine side via an SDK. This is less about "coupling" and more about a declarative contract. From an engineering perspective, convention over configuration is generally the better path:
Convention (auto-reporting/filtering): The engine declares its capabilities → HMS or the SDK automatically masks incompatible metadata. This maintains a single source of truth—the physical properties of the table (format, location) directly determine its visibility. Configuration (manual access control): Administrators manually maintain a separate set of ACL rules outside of HMS to hide certain tables. This essentially creates duplicate definition—the metadata layer already defines "this is an Iceberg table," and then the permission layer has to define "this engine shouldn't see this Iceberg table." As the number of tables or engines scales, this manual synchronization overhead becomes unmanageable. In other words, I'm not asking HMS to understand "what connectors Spark 3.4 has installed." I'm simply suggesting that the physical properties of the metadata (the format type) should automatically determine its distribution scope. If HMS remains completely agnostic and relies on external permission systems to retroactively hide visibility, doesn't that actually increase operational complexity? ---- Replied Message ---- | From | Denys Kuzmenko<[email protected]> | | Date | 03/20/2026 19:12 | | To | [email protected] | | Cc | | | Subject | Re: [Discuss][HIVE-28879] Federated Catalog Support in Apache Hive | I don’t think tying catalog behavior to engine capabilities is a good direction. A catalog should remain engine-agnostic and focus purely on metadata management and discovery, not on the execution capabilities of specific query engines. Hive Metastore is intentionally designed as a neutral metadata service. It exposes table definitions, while each engine (e.g., Apache Spark, Trino, etc.) decides whether it can actually process those tables based on its configured connectors or format support. Introducing capability negotiation would effectively couple the catalog to specific engines and their runtime configuration, which breaks that separation of concerns and makes the catalog responsible for execution-layer logic. If a particular engine does not support a given format or catalog (for example, it does not have the appropriate client/connector installed), the cleaner solution is access control, not metadata filtering. In practice, permissions can simply be removed for users of that engine on catalogs or tables they are not expected to query. Keeping the catalog engine-agnostic preserves interoperability and avoids embedding engine-specific behavior into the metadata layer.
