[
https://issues.apache.org/jira/browse/HIVE-21506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16810130#comment-16810130
]
Todd Lipcon commented on HIVE-21506:
------------------------------------
Does this imply that you'd move the transaction manager out of the HMS into a
standalone daemon? Then we need to worry about HA for that daemon as well as
what happens to locks if the daemon crashes, right? It's probably possible but
would be quite a bit of work.
Another potential design for scaling lock management is to use revocable
"sticky" locks. I think there's a decent amount of literature from the
shared-nothing DBMS world on this technique. With some quick googling I found
that the Frangipani DFS paper has some discussion of the technique, but I think
we can probably find a more detailed description of it elsewhere.
At a very high level, the idea would look something like this for shared locks:
- when a user wants to acquire a shared lock, the HMS checks if any other
transaction already has the same shared lock held. If not, we acquire the lock
in the database, and associate it not with any particular transaction, but with
the HMS's lock manager itself. The HMS then becomes responsible for
heartbeating the lock while it's held. In essence, the HMS has now taken out a
"lease" on this lock.
- if the HMS already has this shared lock held on behalf of another
transaction, increment an in-memory reference count.
- when a lock is released, if the refcount > 1, simply decrement the in-memory
ref count.
- if the lock is released and the refcount goes to 0, the HMS can be lazy about
releasing the lock in the DB (either forever or for some amount of time). In
essence the lock is "sticky".
Given that most locks are shared locks, this should mean that the majority of
locking operations do not require any trip to the RDBMS and can be processed in
memory, but are backed persistently by a lock in the DB.
If a caller wants to acquire an exclusive lock which conflicts with an existing
shared lock in the DB, we need to implement revocation:
- add a record indicating that there's a waiter on the lock, blocked on the
existing shared lock(s)
- send a revocation request to any HMS(s) holding the shared locks. In the case
that they're just being held in "sticky" mode, they can be revoked immediately.
If there is actually an active refcount, this will just enforce that new shared
lockers need to wait instead of incrementing the refcount.
- in the case that an HMS holding a sticky lock has crashed or partitioned, we
need to wait out the "lease" to expire before we can revoke its lock.
There's some trickiness to think through about client->HMS "stickiness" in HA
scenarios, as well. Right now, the lock requests may be sent to a different HMS
than the 'commit/abort' request for a transaction, but that could be difficult
with "sticky locks".
All of the above is a bit complicated, so maybe a first step is to just look at
some kind of stress test/benchmark and understand if we can do any changes to
the way we manage the RDBMS table to be more efficient? Perhaps if we
specialize the implementation for a specific RDBMS (eg postgres) we could get
some benefits here (eg stored procedures to avoid round trips if that turns out
to be the bottleneck)
> Memory based TxnHandler implementation
> --------------------------------------
>
> Key: HIVE-21506
> URL: https://issues.apache.org/jira/browse/HIVE-21506
> Project: Hive
> Issue Type: New Feature
> Components: Transactions
> Reporter: Peter Vary
> Priority: Major
>
> The current TxnHandler implementations are using the backend RDBMS to store
> every Hive lock and transaction data, so multiple TxnHandler instances can
> run simultaneously and can serve requests. The continuous
> communication/locking done on the RDBMS side puts serious load on the backend
> databases also restricts the possible throughput.
> If it is possible to have only a single active TxnHandler (with the current
> design HMS) instance then we can provide much better (using only java based
> locking) performance. We still have to store the committed write transactions
> to the RDBMS (or later some other persistent storage), but other lock and
> transaction operations could remain memory only.
> The most important drawbacks with this solution is that we definitely lose
> scalability when one instance of TxnHandler is no longer able to serve the
> requests (see NameNode), and fault tolerance in the sense that the ongoing
> transactions should be terminated when the TxnHandler is failed. If this
> drawbacks are acceptable in certain situations the we can provide better
> throughput for the users.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)