[jira] [Commented] (HIVE-21506) Memory based TxnHandler implementation

Todd Lipcon (JIRA) Thu, 04 Apr 2019 10:25:22 -0700


    [ 
https://issues.apache.org/jira/browse/HIVE-21506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16810130#comment-16810130
 ]


Todd Lipcon commented on HIVE-21506:
------------------------------------

Does this imply that you'd move the transaction manager out of the HMS into a 
standalone daemon? Then we need to worry about HA for that daemon as well as 
what happens to locks if the daemon crashes, right? It's probably possible but 
would be quite a bit of work.

Another potential design for scaling lock management is to use revocable 
"sticky" locks. I think there's a decent amount of literature from the 
shared-nothing DBMS world on this technique. With some quick googling I found 
that the Frangipani DFS paper has some discussion of the technique, but I think 
we can probably find a more detailed description of it elsewhere.

At a very high level, the idea would look something like this for shared locks:
- when a user wants to acquire a shared lock, the HMS checks if any other 
transaction already has the same shared lock held. If not, we acquire the lock 
in the database, and associate it not with any particular transaction, but with 
the HMS's lock manager itself. The HMS then becomes responsible for 
heartbeating the lock while it's held. In essence, the HMS has now taken out a 
"lease" on this lock.
- if the HMS already has this shared lock held on behalf of another 
transaction, increment an in-memory reference count.
- when a lock is released, if the refcount > 1, simply decrement the in-memory 
ref count.
- if the lock is released and the refcount goes to 0, the HMS can be lazy about 
releasing the lock in the DB (either forever or for some amount of time). In 
essence the lock is "sticky".

Given that most locks are shared locks, this should mean that the majority of 
locking operations do not require any trip to the RDBMS and can be processed in 
memory, but are backed persistently by a lock in the DB.

If a caller wants to acquire an exclusive lock which conflicts with an existing 
shared lock in the DB, we need to implement revocation:
- add a record indicating that there's a waiter on the lock, blocked on the 
existing shared lock(s)
- send a revocation request to any HMS(s) holding the shared locks. In the case 
that they're just being held in "sticky" mode, they can be revoked immediately. 
If there is actually an active refcount, this will just enforce that new shared 
lockers need to wait instead of incrementing the refcount.
- in the case that an HMS holding a sticky lock has crashed or partitioned, we 
need to wait out the "lease" to expire before we can revoke its lock.

There's some trickiness to think through about client->HMS "stickiness" in HA 
scenarios, as well. Right now, the lock requests may be sent to a different HMS 
than the 'commit/abort' request for a transaction, but that could be difficult 
with "sticky locks".


All of the above is a bit complicated, so maybe a first step is to just look at 
some kind of stress test/benchmark and understand if we can do any changes to 
the way we manage the RDBMS table to be more efficient? Perhaps if we 
specialize the implementation for a specific RDBMS (eg postgres) we could get 
some benefits here (eg stored procedures to avoid round trips if that turns out 
to be the bottleneck)

> Memory based TxnHandler implementation
> --------------------------------------
>
>                 Key: HIVE-21506
>                 URL: https://issues.apache.org/jira/browse/HIVE-21506
>             Project: Hive
>          Issue Type: New Feature
>          Components: Transactions
>            Reporter: Peter Vary
>            Priority: Major
>
> The current TxnHandler implementations are using the backend RDBMS to store 
> every Hive lock and transaction data, so multiple TxnHandler instances can 
> run simultaneously and can serve requests. The continuous 
> communication/locking done on the RDBMS side puts serious load on the backend 
> databases also restricts the possible throughput.
> If it is possible to have only a single active TxnHandler (with the current 
> design HMS) instance then we can provide much better (using only java based 
> locking) performance. We still have to store the committed write transactions 
> to the RDBMS (or later some other persistent storage), but other lock and 
> transaction operations could remain memory only.
> The most important drawbacks with this solution is that we definitely lose 
> scalability when one instance of TxnHandler is no longer able to serve the 
> requests (see NameNode), and fault tolerance in the sense that the ongoing 
> transactions should be terminated when the TxnHandler is failed. If this 
> drawbacks are acceptable in certain situations the we can provide better 
> throughput for the users.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-21506) Memory based TxnHandler implementation

Reply via email to