[ 
https://issues.apache.org/jira/browse/HBASE-2332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12870199#action_12870199
 ] 

Michael Dalton commented on HBASE-2332:
---------------------------------------

Here's a rough sketch of the algorithm, sorry if it's a bit verbose. I take 
into account that row locks are per-RS. To give you a quick overview, we 
support reading, updating, and renaming a value V in a row. Renames move value 
V from one row to another. All reads occur without row locks (unless a rename 
failure occurs), and all updates (put/delete), which only affect a single row, 
use HBase CAS. The only situation where I need locking is to deal with renames 
and rename failure recovery. I need rename to appear to be atomic from the 
client's perspective, but I can perform rename failure recovery so long as it 
appears transparent.

Rename of value V from row A to row B is accomplished with 5 steps: (1) acquire 
row locks on A and B in row key order (so that a deadly embrace does not occur 
where another client locks B and then A) (2) Update value V in row A using CAS 
with V~src~ which marks V as a rename source and includes the row key of B as 
the destination (3) Update B using CAS (using an empty value as the comparison 
value) with a modified copy of V, V~dst~, which marks B as the destination(and 
includes A's row key as source) (4) Delete V from Row A (5) Put original value 
V in Row B (this time not marked as a rename source or destination, just the 
exact original bytes). In steps 2-5, each of the CAS, Delete, and Put 
operations uses the appropriate row lock, so row lock A is used for any 
modifications to row A.

At any one of these 5 steps row locks may be lost or the machine performing the 
rename may die. This will leave one or both of the rename source/destination in 
the 'pending rename' state. Thus we must recover from any prefix of the rename 
operations executed by a failed rename, which occurs when we read a value that 
is marked as pending rename source (V~src~) or pending rename 
destination(V~dst~). 
Initial reads occur without locks. However, reads are _not_ allowed to return 
'pending rename' values. Updates (Put/Delete) are not affected by renames, as 
all updates use CAS, and no 'read value' operation is permitted to return a 
'pending rename' value, and thus no update will successfully modify a value 
marked 'pending rename' as the CAS will fail. 

If a pending rename value is encountered, the read must recover from the rename 
failure. The recovery method acquires locks on A _and_ B (which it can do 
because pending values include the row key of the 'other' row, i.e. V~src~ 
includes B's row key). Then the recovery process performs an undo or redo 
depending on if V~dst~ was written, undoing if V~dst~ was never written by 
writing the original value V to row A, and re-doing the rename by re-executing 
steps (4)-(5) of the rename otherwise. In either case, all modifications are 
performed with the appropriate row lock. 
The critical benefit locks provide is that once the rename recovery procedure 
has acquired row locks on A and B, it is guaranteed that the original rename 
procedure (or previous failed rename recovery procedures) will never 'wake up' 
later and successfully perform mutations related to the failed rename (as their 
locks will have expired).  I can provide more details about this algorithm in 
general if the above is unclear.

This algorithm seems to work correctly given that locks may be lost at any 
time. If I understand what you said correctly, the major issue is that locks 
are not persisted in HLog currently. However, the only risk presented by lack 
of lock persistence is locks may be lost before lease expiration, correct? 
However, it is still the case that at most one client can lock at a row -- it's 
just that a client may lose their lock even before the lease expires due to 
lack of HLog persistence (and all future operations performed with the 'lost' 
lock will fail).

I don't see this as a huge drawback as anyone dealing with a critical section 
guarded by HBase row locks _must_ handle the case that locks may be lost 
because locks have leases that may expire. Thus any critical section may fail 
at an arbitrary point and complete only a prefix of the operations in the 
critical section. From a correctness perspective, ensuring that programs 
transparently handle lock failure due to lease expiration should also address 
losing locks due to lack of persistence in the HLog. Please correct me if I'm 
missing something here concerning the HLog durability, or if the effect differs 
from what I've described..

> Remove client-exposed row locks from region server
> --------------------------------------------------
>
>                 Key: HBASE-2332
>                 URL: https://issues.apache.org/jira/browse/HBASE-2332
>             Project: Hadoop HBase
>          Issue Type: Improvement
>          Components: client, regionserver
>            Reporter: Todd Lipcon
>             Fix For: 0.22.0
>
>
> Discussion in HBASE-2294 has surfaced that the client-exposed row lock 
> feature in the HBase API may not be scalable/necessary. Additionally there 
> are some benefits we can reap by removing the feature (or pushing it into the 
> client).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to