zhangbutao commented on PR #6088:
URL: https://github.com/apache/hive/pull/6088#issuecomment-3580907907

   @dengzhhu653 Thanks for the suggesstion!
   
   > It seems to me this is a significant/big change, and the user should 
follow a new way of catalog.db.table to request the table as the Trino/Presto 
does today.
   
   That doesn't seem quite right. My goal is that even if users don't use this 
feature, they can still use the classic Hive approach, such as `select * from 
table` or `select * from db.table`. Users don't need to worry about catalogs 
because their current session has a default catalog. Of course, users can also 
specify a catalog name as a prefix to query a table, for example, `select * 
from hive.db.tbl`. 
   
   Additionally, if users want to perform federated queries across multiple 
data sources in Hive, similar to Trino, they would need to create multiple 
catalogs and execute similar queries. This federated query capability is a new 
feature and does not pose compatibility issues for users.
   ```
   SELECT 
       u.user_name,
       a.action_time
   FROM mysql.test.users u
   JOIN hive.default.user_actions a 
       ON u.user_id = a.user_id;
   ```
   
   
   > I think we can have a more simple idea, let's push the catalog awareness 
down into the Metastore client. For example,
   > SET CATALOG testcat; -> change to a metaconf setting: set 
metaconf:metastore.catalog.default = testcat, default hive, this configuration 
will be propagated to HMS, every HMS API request will be appended with the 
testcat, both client and server can handle it.
   
   Although this allows switching catalogs, the current client can only access 
the databases and tables of one catalog at a time.
   
   
   
   > For cross-catalog queries, we can also do it in a similar way, we can 
introduce a new catalog awareness lawyer between session client and the thrift 
client, as SessionHiveMetaStoreClient -> CatalogAwarenessMetastoreClient -> 
ThriftHiveMetaStoreClient. The CatalogAwarenessMetastoreClient can acknowledge 
of the catalog through configuration or properties stored in 
HMS([HIVE-27186](https://issues.apache.org/jira/browse/HIVE-27186)). Something 
as 
hive.matastore.catalog.mapping.pattern=iceberg_catalog:ice_db1*,hive_db1,hive_db2.ice_tab*;hive:default,acid_db*,
 the CatalogAwarenessMetastoreClient even can point to a third-party catalog 
partner.
   
   This method allows accessing databases and tables across multiple catalogs, 
but it is quite inelegant. Moreover, as you mentioned, it cannot support having 
the same database names across different catalogs. This solution is somewhat 
similar to https://github.com/ExpediaGroup/waggle-dance. Additionally, it 
requires users to constantly modify parameters to add or remove database names 
in `hive.matastore.catalog.mapping.pattern`, which seems very cumbersome...
   
   
   Finally, I want to reference the multi/federated catalog capabilities of 
Trino and StarRocks/Doris. I believe their design approaches are quite 
reasonable. I'm considering implementing this capability with the principle of 
minimizing code modifications. Considerable preliminary work has already been 
done on the HMS side with HIVE-18685. The subsequent work mainly involves 
connecting all catalog capabilities from the HS2 side to the HMS side. I've 
been thinking about writing down the related design considerations, but haven't 
had enough time yet... I will provide more thoughts on this later.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to