Hi, thanks for summarizing all the challenges about Federated Catalog! I also think it is something we should work on.
For the topic filtering of unsupported tables, I also think we can spawn another thread. Hive has features for declaring and testing the processor's capabilities[1]. We may have a similar logic, but I haven't found the perfect solution yet. For example, even if HMS filters out those tables, CREATE TABLE with the same table name must still fail. I personally think a client should throw a kind message on read instead. Regards, Okumin - [1] https://github.com/apache/hive/blob/rel/release-4.2.0/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MetastoreDefaultTransformer.java On Tue, Mar 31, 2026 at 1:40 AM Butao Zhang <[email protected]> wrote: > > The suggestion proposed by Zhihua is: "add a plugin service between the > Metastore client and the engine, this plugin will translate the Hive metadata > to anything the engine wants." > > This essentially refers to implementing engine-specific read/write plugins > for various catalogs in HMS. These read/write plugins rely on engine-specific > capabilities and must be implemented according to the interface > specifications exposed by the engine. For example, in Gravitino, based on > Spark's DataSource V2 interface > (https://github.com/apache/gravitino/tree/main/spark-connector), read/write > support for various catalogs in Gravitino has been implemented. > > From my perspective, implementing the engine side may be a different story. > This is because such implementation requires deep understanding of the open > interfaces of specific engines (such as Spark and Trino), and involves > significant development effort. However, if we enhance HMS's multi-catalog > capabilities, we can encourage more community developers to get involved in > the future and implement read/write plugins for HMS catalogs across different > engines. > > > Thanks, > Butao Zhang > > > On 2026/03/24 04:54:46 Zhihua Deng wrote: > > +1 for engine-agnostic, unified metadata and discovery, multi-tenancy > > and granular ACLs catalog federation. > > > > It's important to consider how the engine will consume the metadata before > > we start. As the catalog is engine-agnostic, I would like to add a plugin > > service between the Metastore client and the engine, this plugin will > > translate the Hive metadata to anything the engine wants. > > > > On 2026/03/23 19:45:44 Sai Hemanth Gantasala wrote: > > > +1 to Deny's and Butao's suggestions. > > > > > > Lisoda, > > > 1) I agree that relying on external permission systems for basic table > > > visibility can be complex and error-prone. However, introducing capability > > > filtering, even based on format type, still moves HMS away from its core > > > role as an engine-agnostic metadata service. We need a solution that > > > addresses the operational complexity without compromising HMS neutrality. > > > 2) I see your point on operational complexity, but the need for external > > > permissions goes beyond format support, it is essential for multi-tenancy > > > and granular security. We must be able to hide a sensitive Iceberg table > > > from a user, even if their engine is capable of reading Iceberg. > > > Separating > > > the security policy (ACLs) from the metadata definition (HMS) remains the > > > correct architectural approach IMO. > > > > > > Thanks, > > > Sai > > > > > > On Mon, Mar 23, 2026 at 3:18 AM Butao Zhang <[email protected]> wrote: > > > > > > > I mostly agree with Denys's viewpoint. That is, when querying Iceberg > > > > and > > > > Hudi tables in HMS, engines need to implement and configure their own > > > > connectors. These connectors are specific to each engine and have > > > > nothing > > > > to do with HMS itself. HMS serves as a neutral, unified metadata > > > > management > > > > service, responsible only for managing the lifecycle of catalogs (such > > > > as > > > > creation and deletion) and providing unified metadata authorization > > > > services. > > > > > > > > > > > > Add some extra information to respond to lisoda: > > > > > > > > 1) Q1: HMS may store various types of tables (e.g., Iceberg, Hudi), and > > > > some engines may not be able to query certain types of tables stored in > > > > HMS. > > > > First, this issue seems unrelated to the multi-catalog or federated > > > > catalog approach I proposed. This is essentially a problem where > > > > multiple > > > > table formats (Iceberg, Hudi, etc.) are mixed within a single HMS > > > > catalog. > > > > When a compute engine is configured with this HMS catalog, it may be > > > > able > > > > to see all tables via `SHOW TABLES`, but it may only be able to query a > > > > subset of them. This issue should be handled at the compute engine > > > > level. > > > > For example, the engine can determine whether a table should be visible > > > > or > > > > whether it can be queried based on table attributes like `table_type`. > > > > For instance, StarRocks provides a catalog/connector called the Unified > > > > Catalog ( > > > > https://docs.starrocks.io/docs/data_source/catalog/unified_catalog/), > > > > which can query multiple table formats (such as Iceberg and Hudi) > > > > stored in > > > > the same HMS. > > > > > > > > If users only want to query a specific type of table stored in the same > > > > HMS, such as Iceberg tables, they can create a dedicated > > > > catalog/connector, > > > > like the Iceberg Catalog ( > > > > https://docs.starrocks.io/docs/data_source/catalog/iceberg/iceberg_catalog/). > > > > This catalog/connector allows users to see only Iceberg tables when > > > > running > > > > `SHOW TABLES`, and any other table formats will be invisible. > > > > > > > > Additionally, based on my tests, when using > > > > `org.apache.iceberg.spark.SparkSessionCatalog`, Spark should be able to > > > > query both Hive tables and Iceberg tables through the HMS catalog. > > > > > > > > 2) Q2: Regarding the issue of circular catalogs, I believe this does not > > > > exist. When a compute engine is configured with an HMS catalog, that HMS > > > > catalog can only see its own catalog namespace (databases and tables). > > > > The > > > > engine cannot see information from other catalogs through this HMS > > > > catalog. > > > > > > > > > > > > Thanks, > > > > Butao Zhang > > > > ---- Replied Message ---- > > > > From lisoda<[email protected]> <[email protected]> > > > > Date 3/20/2026 22:53 > > > > To dev<[email protected]> <[email protected]> > > > > Subject Re: [Discuss][HIVE-28879] Federated Catalog Support in Apache > > > > Hive > > > > I understand your concern, but I may not have expressed myself clearly—I > > > > don't intend to tightly couple the catalog with specific engine runtime > > > > configurations either. What I'm suggesting is a lightweight convention > > > > mechanism, not deep integration. > > > > My idea is actually quite simple: engines could report just a few > > > > boolean > > > > flags upon connection (e.g., supports_iceberg: true/false ), or we > > > > could > > > > push the filtering logic down to the engine side via an SDK. This is > > > > less > > > > about "coupling" and more about a declarative contract. > > > > From an engineering perspective, convention over configuration is > > > > generally the better path: > > > > > > > > Convention (auto-reporting/filtering): The engine declares its > > > > capabilities → HMS or the SDK automatically masks incompatible metadata. > > > > This maintains a single source of truth—the physical properties of the > > > > table (format, location) directly determine its visibility. > > > > > > > > Configuration (manual access control): Administrators manually maintain > > > > a > > > > separate set of ACL rules outside of HMS to hide certain tables. This > > > > essentially creates duplicate definition—the metadata layer already > > > > defines > > > > "this is an Iceberg table," and then the permission layer has to define > > > > "this engine shouldn't see this Iceberg table." As the number of tables > > > > or > > > > engines scales, this manual synchronization overhead becomes > > > > unmanageable. > > > > In other words, I'm not asking HMS to understand "what connectors Spark > > > > 3.4 has installed." I'm simply suggesting that the physical properties > > > > of > > > > the metadata (the format type) should automatically determine its > > > > distribution scope. If HMS remains completely agnostic and relies on > > > > external permission systems to retroactively hide visibility, doesn't > > > > that > > > > actually increase operational complexity? > > > > > > > > > > > > ---- Replied Message ---- > > > > From Denys Kuzmenko<[email protected]> <[email protected]> > > > > Date 03/20/2026 19:12 > > > > To [email protected] > > > > Cc > > > > Subject Re: [Discuss][HIVE-28879] Federated Catalog Support in Apache > > > > Hive > > > > I don’t think tying catalog behavior to engine capabilities is a good > > > > direction. A catalog should remain engine-agnostic and focus purely on > > > > metadata management and discovery, not on the execution capabilities of > > > > specific query engines. > > > > > > > > Hive Metastore is intentionally designed as a neutral metadata service. > > > > It > > > > exposes table definitions, while each engine (e.g., Apache Spark, Trino, > > > > etc.) decides whether it can actually process those tables based on its > > > > configured connectors or format support. Introducing capability > > > > negotiation > > > > would effectively couple the catalog to specific engines and their > > > > runtime > > > > configuration, which breaks that separation of concerns and makes the > > > > catalog responsible for execution-layer logic. > > > > > > > > If a particular engine does not support a given format or catalog (for > > > > example, it does not have the appropriate client/connector installed), > > > > the > > > > cleaner solution is access control, not metadata filtering. In practice, > > > > permissions can simply be removed for users of that engine on catalogs > > > > or > > > > tables they are not expected to query. > > > > > > > > Keeping the catalog engine-agnostic preserves interoperability and > > > > avoids > > > > embedding engine-specific behavior into the metadata layer. > > > > > > > > >
