[ 
https://issues.apache.org/jira/browse/HIVE-18685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16364045#comment-16364045
 ] 

Peter Vary edited comment on HIVE-18685 at 2/14/18 1:44 PM:
------------------------------------------------------------

Hi [~alangates],

Thanks for the quick answers!
{quote}Are we seeing issues where the DB locks are slowing us down?
{quote}
We have definitely seen locking problems on customer side during DDL intensive 
times. See the attached mysql.log

The particular problem is solved by adding more resources for mysql, but in my 
mind this is only a temporary solution. We have long running transactions with 
unique indexes (database, table, notification) mixed with file system 
operations (possible S3). The primary keys can easily create situations where 
we end up serializing these requests.

 

When I have read this in the design document:
{quote}Possible Future Use for Caching:

We need a way in the metastore to limit the number of objects and transactions 
to a size that can be managed by a single server so that we can effectively 
cache data, transactions, and locks in the metastore. However, for obvious 
reasons, we do not want to limit the metastore to running on a single server. 
Catalogs may offer a natural place to define caching that scopes size of 
objects and transactions reasonably while not limiting the overall size of the 
metastore.
{quote}
I thought that the goal is that a single MetaStore instance should only handle 
a limited number of catalogs, so it can cache and serve these catalogs 
effectively. My assumption was that different MetaStore instances will serve 
different set of catalogs, and a MetaStore client - like a DFS client - first 
finds out which MetaStore(s) (aka. DataNode) handles the given catalog, and 
then query the data from there. I do not think that we need such a complicated 
solution for it like a NameNode, possibly a single ZooKeeper node can serve as 
a configuration store, and this node can easily be updated in case of a new 
catalog is added.

 

As for the Thrift issue:
{quote}I think I will likely still change HiveMetaStoreClient to add methods 
with explicit catalog name, but that is much easier than adding thrift methods. 
 And in HiveMetaStoreClient I can explicitly deprecate the old methods, giving 
users a warning not to continue using them.
{quote}
This sounds like a good temporary solution for the Thrift API problem.

 

Thanks,

Peter 


was (Author: pvary):
Hi [~alangates],

Thanks for the quick answers!
{quote}Are we seeing issues where the DB locks are slowing us down?
{quote}
We have definitely seen locking problems on customer side during DDL intensive 
times. See the attached mysql.log

The particular problem is solved by adding more resources for mysql, but in my 
mind this is only a temporary solution. We have long running transactions with 
unique indexes (database, table, notification) mixed with file system 
operations (possible S3). The primary keys can easily create situations where 
we end up serializing these requests.

 

When I have read this in the design document:
{quote}Possible Future Use for Caching:

We need a way in the metastore to limit the number of objects and transactions 
to a size that can be managed by a single server so that we can effectively 
cache data, transactions, and locks in the metastore. However, for obvious 
reasons, we do not want to limit the metastore to running on a single server. 
Catalogs may offer a natural place to define caching that scopes size of 
objects and transactions reasonably while not limiting the overall size of the 
metastore.
{quote}
I thought that the goal is that a single MetaStore instance should only handle 
a limited number of catalogs, so it can cache and serve these catalogs 
effectively. My assumption was that different MetaStore instances will serve 
different set of catalogs, and a MetaStore client - like a DFS client - first 
finds out which MetaStore(s) (aka. DataNode) handles the given catalog, and 
then query the data from there. I do not think that we need such a complicated 
solution for it like a NameNode, possibly a single ZooKeeper node can serve as 
a configuration store, and this node can easily be updated in case of a new 
catalog is added.

 

As for the Thrift issue:
{quote}I think I will likely still change HiveMetaStoreClient to add methods 
with explicit catalog name, but that is much easier than adding thrift methods. 
 And in HiveMetaStoreClient I can explicitly deprecate the old methods, giving 
users a warning not to continue using them.
{quote}
This sounds like a good temporary solution for the Thrift API problem.

 

Thanks,

Peter

Thanks,

Peter

 

> Add catalogs to metastore
> -------------------------
>
>                 Key: HIVE-18685
>                 URL: https://issues.apache.org/jira/browse/HIVE-18685
>             Project: Hive
>          Issue Type: New Feature
>          Components: Metastore
>    Affects Versions: 3.0.0
>            Reporter: Alan Gates
>            Assignee: Alan Gates
>            Priority: Major
>         Attachments: HMS Catalog Design Doc.pdf
>
>
> SQL supports two levels of namespaces, called in the spec catalogs and 
> schemas (with schema being equivalent to Hive's database).  I propose to add 
> the upper level of catalog.  The attached design doc covers the use cases, 
> requirements, and brief discussion of how it will be implemented in a 
> backwards compatible way.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to