Vihang Karajgaonkar created HIVE-16452:
------------------------------------------
Summary: Database UUID for metastore DB
Key: HIVE-16452
URL: https://issues.apache.org/jira/browse/HIVE-16452
Project: Hive
Issue Type: New Feature
Components: Metastore
Reporter: Vihang Karajgaonkar
Assignee: Vihang Karajgaonkar
In cloud environments it is possible that a same database instance is used as
the long running metadata persistence layer and multiple HMS access this
database. These HMS instances could be running the same time or in case of
transient workloads come up on an on-demand basis. HMS is used by multiple
projects in the Hadoop eco-system as the de-facto metadata keeper for various
SQL engines on the cluster. Currently, there is no way to uniquely identify the
database instance which is backing the HMS. For example, if there are two
instances of HMS running on top of same metastore DB, there is no way to
identify that data received from both the metastore clients is coming from the
same database. Similarly, if there in case of transient workloads multiple HMS
services come up and go, a external application which is fetching data from a
HMS has no way to identify that these multiple instances of HMS are in fact
returning the same data.
We can potentially use the combination of javax.jdo.option.ConnectionURL,
javax.jdo.option.ConnectionDriverName configuration of each HMS instance but
this is approach may not be very robust. If the database is migrated to another
server for some reason the ConnectionURL can change. Having a UUID in the
metastore DB which can be queried using a Thrift API can help solve this
problem. This way any application talking to multiple HMS instances can
recognize if the data is coming the same backing database.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)