[ 
https://issues.apache.org/jira/browse/HIVE-21115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16742472#comment-16742472
 ] 

Vihang Karajgaonkar commented on HIVE-21115:
--------------------------------------------

While working on the POC of this patch, we realized that there is fundamental 
issue with going with a datanucleus based approach. The issue is that the any 
version numbers generated are available post-commit which is acceptable for 
certain cases. However, if a client wishes to sync to metastore using 
{{NotificationEvent}} API, the version number in the before and after objects 
in the alter events will not have the updated versions. This happens because 
the event is generated before actual commit using the data of before and after 
thrift objects. So we could either fetch the object from the database again to 
get the after object or generate the version number in the metastore instead of 
relying on datanucleus so that we know what the new updated version number 
would be during event creation time.

In general there are following advantages of having metastore generate the 
version number:
1. Metastore has complete control on the version generation logic. If we rely 
on datanucleus we don't really control the code which generates the version 
numbers. Hence any anomalies or bugs in that code would cause a problem.
2. It is consistent since all the other fields of a thrift object are generated 
by the metastore itself like createTime, lastDDL time etc. We don't rely on 
datanucleus to generate any application data elsewhere except for the unique 
ids used to identify each M*Objects (MDatabase, MTable etc) which should be 
seen as internal mechanism of datanclues.

On the flip side, it complicates the logic to generate the version numbers. At 
the very least we need to store one value (for version) for each 
table/database/partition in the database and we need to make sure the version 
increment logic works when HMS-HA is enabled without any race conditions.

> Add support for object versions in metastore
> --------------------------------------------
>
>                 Key: HIVE-21115
>                 URL: https://issues.apache.org/jira/browse/HIVE-21115
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Vihang Karajgaonkar
>            Priority: Major
>
> Currently, metastore objects are identified uniquely by their names (eg. 
> catName, dbName and tblName for a table is unique). Once a table or partition 
> is created it could be altered in many ways. There is no good way currently 
> to identify the version of the object once it is altered. For example, 
> suppose there are two clients (Hive and Impala) using the same metastore. 
> Once some alter operations are performed by a client, another client which 
> wants to do a alter operation has no good way to know if the object which it 
> has is the same as the one stored in metastore. Metastore updates the 
> {{transient_lastDdlTime}} every time there is a DDL operation on the object. 
> However, this value cannot be relied for all the clients since after 
> HIVE-1768 metastore updates the value only when it is not set in the 
> parameters. It is possible that a client which alters the object state, does 
> not remove the {{transient_lastDdlTime}} and metastore will not update it. 
> Secondly, if there is a clock skew between multiple HMS instances when HMS-HA 
> is configured, time values cannot be relied on to find out the sequence of 
> alter operations on a given object.
> This JIRA propose to use JDO versioning support by Datanucleus  
> http://www.datanucleus.org/products/accessplatform_4_2/jdo/versioning.html to 
> generate a incrementing sequence number every time a object is altered. The 
> value of this object can be set as one of the values in the parameters. The 
> advantage of using Datanucleus the versioning can be done across HMS 
> instances as part of the database transaction and it should work for all the 
> supported databases.
> In theory such a version can be used to detect if the client is presenting a 
> object which is "stale" when issuing a alter request. Metastore can choose to 
> reject such a alter request since the client may be caching a old version of 
> the object and any alter operation on such stale object can potentially 
> overwrite previous operations. However, this is can be done in a separate 
> JIRA.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to