[ 
https://issues.apache.org/jira/browse/HBASE-11945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14129502#comment-14129502
 ] 

Todd Lipcon commented on HBASE-11945:
-------------------------------------

The potential interleaving is:

Client 1: issues a batch with 2000 puts: Put "row1", "cf:col1", {0...1000}, Put 
"row2", "cf:col1", {0...1000}
Client 2: issues a batch with 1 put: Put "row2", "cf:col2", "x"
(ie same row, different column)

These two clients will contend for the same row lock. The "minibatch" code path 
iterates through the batch trying to acquire locks, and skipping the operations 
for a later pass if the lock is not available. So, I think these may interleave 
as follows:

C1: acquires lock for row1, and is in the process of iterating over the rest of 
the "row1" operations
C2: acquires lock for "row2", and is in the process of actually applying the 
operation to MemStore, etc
C1: fails to acquire the lock for the first row2 op, since row1 already has it. 
But, there are still 999 more row2 ops to iterate over
C2: commits its "row2" operation, releasing the lock
C1: manages to acquire the lock for a later row2 op (eg the put of "row2", 
"cf:col1", 500
C1: commits the minibatch

Now it is easy to see that C1 has committed its put of "500" before other puts 
which came earlier from the client.

This re-ordering is unexpected from C1's point of view, since when it later 
reads the row, something other than the "latest" data might persist (eg the 
1000th put it did might actually have gotten executed first instead of last). 
The problem's worse with a delete/insert sequence, when you have a 50% chance 
of ending up with a deleted row at the end.

I haven't tried to produce this bug, but I think you could build a functional 
test as follows:

T1: writes batches with 1000 puts (arbitrary contnets) to "row1" and 1000 puts 
to "row2" (increasing integers)
T2: writes non-batched writes to a different column of row2
T3: read "row2" in a loop and verify that the integer column is never seen to 
decrease.

1000 might not be large enough batches to reliably reproduce it, but I bet you 
could get this to fail eventually.

> Client writes may be reordered under contention
> -----------------------------------------------
>
>                 Key: HBASE-11945
>                 URL: https://issues.apache.org/jira/browse/HBASE-11945
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.98.6
>            Reporter: Todd Lipcon
>
> I haven't seen this bug in practice, but I was thinking about this a bit and 
> think there may be a correctness issue with the way that we handle client 
> batches which contain multiple operations which touch the same row. The 
> client expects that these operations will be performed in the same order they 
> were submitted, but under contention I believe they can get arbitrarily 
> reordered, leading to incorrect results.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to