[
https://issues.apache.org/jira/browse/HIVE-12285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14998363#comment-14998363
]
Elliot West commented on HIVE-12285:
------------------------------------
[~cwsteinbach], I wanted to check if you were intending to provide an
implementation for this? I have the beginnings of a patch that simply wraps the
{{IMetaStoreClient}} lock related methods, but it's lacking some tests and
would need to go through my company's OSS approval process before I could
submit it. I'm happy to proceed (or not) as you wish.
> Add locking to HCatClient
> -------------------------
>
> Key: HIVE-12285
> URL: https://issues.apache.org/jira/browse/HIVE-12285
> Project: Hive
> Issue Type: Improvement
> Components: HCatalog
> Affects Versions: 2.0.0
> Reporter: Elliot West
> Assignee: Carl Steinbach
> Labels: concurrency, hcatalog, lock, locking, locks
>
> With the introduction of a concurrency model (HIVE-1293) Hive uses locks to
> coordinate access and updates to both table data and metadata. Within the
> Hive CLI such lock management is seamless. However, Hive provides additional
> APIs that permit interaction with data repositories, namely the HCatalog
> APIs. Currently, operations implemented by this API do not participate with
> Hive's locking scheme. Furthermore, access to the locking mechanisms is not
> exposed by the APIs (as is the case with the Metastore Thrift API) and so
> users are not able to explicitly interact with locks either. This has created
> a less than ideal situation where users of the APIs have no choice but to
> manipulate these data repositories outside of the command of Hive's lock
> management, potentially resulting in situations where data inconsistencies
> can occur both for external processes using the API and for queries executing
> within Hive.
> h3. Scope of work
> This ticket is concerned with sections of the HCatalog API that deal with DDL
> type operations using the metastore, not with those whose purpose is to
> read/write table data. A separate issue already exists for adding locking to
> HCat readers and writers (HIVE-6207).
> h3. Proposed work
> The following work items would serve as a minimum deliverable that would both
> allow API users to effectively work with locks:
> * Comprehensively document on the wiki the locks required for various Hive
> operations. At a minimum this should cover all operations exposed by
> {{HCatClient}}. The [Locking design
> document|https://cwiki.apache.org/confluence/display/Hive/Locking] can be
> used as a starting point or perhaps updated.
> * Implement methods and types in the {{HCatClient}} API that allow users to
> manipulate Hive locks. For the most part I'd expect these to delegate to the
> metastore API implementations:
> ** {{org.apache.hadoop.hive.metastore.IMetaStoreClient.lock(LockRequest)}}
> ** {{org.apache.hadoop.hive.metastore.IMetaStoreClient.checkLock(long)}}
> ** {{org.apache.hadoop.hive.metastore.IMetaStoreClient.unlock(long)}}
> ** -{{org.apache.hadoop.hive.metastore.IMetaStoreClient.showLocks()}}-
> ** {{org.apache.hadoop.hive.metastore.IMetaStoreClient.heartbeat(long, long)}}
> ** {{org.apache.hadoop.hive.metastore.api.LockComponent}}
> ** {{org.apache.hadoop.hive.metastore.api.LockRequest}}
> ** {{org.apache.hadoop.hive.metastore.api.LockResponse}}
> ** {{org.apache.hadoop.hive.metastore.api.LockLevel}}
> ** {{org.apache.hadoop.hive.metastore.api.LockType}}
> ** {{org.apache.hadoop.hive.metastore.api.LockState}}
> ** -{{org.apache.hadoop.hive.metastore.api.ShowLocksResponse}}-
> h3. Additional proposals
> Explicit lock management should be fairly simple to add to {{HCatClient}},
> however it puts the onus on the API user to correctly understand and
> implement code that uses lock in an appropriate manner. Failure to do so may
> have undesirable consequences. With a simpler user model the operations
> exposed on the API would automatically acquire and release the locks that
> they need. This might work well for small numbers of operations, but not
> perhaps for large sequences of invocations. (Do we need to worry about this
> though as the API methods usually accept batches?). Additionally tasks such
> as heartbeat management could also be handled implicitly for long running
> sets of operations. With these concerns in mind it may also be beneficial to
> deliver some of the following:
> * A means to automatically acquire/release appropriate locks for
> {{HCatClient}} operations.
> * A component that maintains a lock heartbeat from the client.
> * A strategy for switching between manual/automatic lock management,
> analogous to SQL's {{autocommit}} for transactions.
> An API for lock and heartbeat management already exists in the HCatalog
> Mutation API (see:
> {{org.apache.hive.hcatalog.streaming.mutate.client.lock}}). It will likely
> make sense to refactor either this code and/or code that uses it.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)