Alan Gates commented on HIVE-18685:

Lots of good questions, feedback, and thoughts, thanks.  There's a lot to 
respond to so I'll answer them in a series of comments.
{quote}Should we also consider sharding by catalog so that each catalog could 
be served by different server instance or served by different backing DB? 
Otherwise we'll have common issues of contending over the same DB locks.

Would you consider possibility of each catalog implemented by potentially 
different backing store?
 I thought of having each catalog stored in a different instance of the backend 
RDBMS. But this runs into a few problems: 
It complicates the code significantly because:
 # Now you'd have an array of RawStore implementations to manage and always 
make sure you were using the right one. 
 # You still have to have a small 'master' RDBMS where you keep the pointers to 
the RDBMS for each catalog.
 # Each time a catalog is created Hive would have to be able to install a new 
instance of its catalog tables in the RDBMS. This is the show stopper, as many 
users configure their system such that Hive does not have the authority to 
create new schemas/databases and tables in the RDBMS on the fly.

Are we seeing issues where the DB locks are slowing us down?

As mentioned in the document, I would like to use the catalog concept to 
connect to 'foreign' stores, like Druid, HBase, etc. Basically this would be an 
extension of the StorageHandler concept to include metadata. I think you could 
also use this to connect to other Hive instances. I envision it working 
something like this: The admin maps a foreign store or external Hive instance 
into a catalog with a command like

{{create catalog hbasecat uses HBaseStorageHandler ...}}
{{create catalog other_hive_instance references thrifts://otherhive.com}}

These are just examples, I haven't worked out the details of how it would work.

> Add catalogs to metastore
> -------------------------
>                 Key: HIVE-18685
>                 URL: https://issues.apache.org/jira/browse/HIVE-18685
>             Project: Hive
>          Issue Type: New Feature
>          Components: Metastore
>    Affects Versions: 3.0.0
>            Reporter: Alan Gates
>            Assignee: Alan Gates
>            Priority: Major
>         Attachments: HMS Catalog Design Doc.pdf
> SQL supports two levels of namespaces, called in the spec catalogs and 
> schemas (with schema being equivalent to Hive's database).  I propose to add 
> the upper level of catalog.  The attached design doc covers the use cases, 
> requirements, and brief discussion of how it will be implemented in a 
> backwards compatible way.

This message was sent by Atlassian JIRA

Reply via email to