[ 
https://issues.apache.org/jira/browse/HBASE-12751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14378104#comment-14378104
 ] 

Elliott Clark commented on HBASE-12751:
---------------------------------------

Sure so all the fun happens in HRegion#doMiniBatchMutation.

I'll talk about line numbers and current implementation though they might 
change so I've linked what I currently see.

h2. Normal Flow
* on line 
[3039|https://github.com/apache/hbase/blob/a75a2ace4f52daa1eb415f085dfad920b30c3349/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java#L3039]
 row locks for all the mutations possible are obtained. In the current code 
this means that only one set of mutations for a row can continue at a time. 
However if we change that to reduce contention on rows things break farther 
down.
* on line 
[3109|https://github.com/apache/hbase/blob/a75a2ace4f52daa1eb415f085dfad920b30c3349/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java#L3109]
 a pre-assigned read point is assigned 1M + some sequence This sequence is 
independent of the actual sequence id. It is entirely possible for this 
sequence to be assigned to edits in a different order than they will actually 
be applied to the wal log.
* on line 
[3141|https://github.com/apache/hbase/blob/a75a2ace4f52daa1eb415f085dfad920b30c3349/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java#L3141]
 the mutations are added to the memstore and the memstore adds them to 
CellSkipListSet which uses KVComparator.COMPARATOR to sort mutations. They sort 
in the pretty normal order of rk, family, qual, timestamp, mvcc number. Since 
the sequence Id here is 1M + some these edits sort before anything that's 
committed.
* Each edit that's applied to the memstore is also added to the memstoreCells 
ArrayList.
* the list of edits applied is packaged into a wal edit.
* 
[3182|https://github.com/apache/hbase/blob/a75a2ace4f52daa1eb415f085dfad920b30c3349/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java#L3181]
 this waledit is then appended to the wal
* inside of FSHlog 
[1191|https://github.com/apache/hbase/blob/a75a2ace4f52daa1eb415f085dfad920b30c3349/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FSHLog.java#L1191]
 an FSWalEntry is created and added to the ring buffer.
* at some point the ring buffer calls 
[1920|https://github.com/apache/hbase/blob/a75a2ace4f52daa1eb415f085dfad920b30c3349/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FSHLog.java#L1920]
 the event handler's FSHlog.RingBufferEventHandler#onEvent method.
* this event handler notices that the event is an append and so it calls 
[2026|https://github.com/apache/hbase/blob/a75a2ace4f52daa1eb415f085dfad920b30c3349/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FSHLog.java#L2026]
 FSHlog.RingBufferEventHandler#append.
* inside that function at 
[2037|https://github.com/apache/hbase/blob/a75a2ace4f52daa1eb415f085dfad920b30c3349/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FSHLog.java#L2037]
 we call FSWalEntry#stampRegionSequenceId 
* FSWalEntry#stampRegionSequenceId goes through every mutation that is in the 
memstore arraylist and changes their sequence id. 
[127|https://github.com/apache/hbase/blob/a75a2ace4f52daa1eb415f085dfad920b30c3349/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FSWALEntry.java#L127]
* The mutations in the memstore's CellSkipListSet get a new sequence id. 
However the comparator is only run on insert so any change in the sequence 
number that would change order isn't reflected in the order that 
CellSkipListSet will return key values.
* HRegion#doMiniBatchMutation releases the row locks and continues on.
* the WAL is synced and everything is returned as success.

h2. The Issue

The sequence id from MultiVersionConsistencyControl#getPreAssignedWriteNumber 
is completely independent of the sequence id that the ring buffer hands out. 
Right now we luck out and never have to deal with this because all edits to the 
same row are serialized by the row locks. However if we change it so that more 
than one edit for a row can be in the memstore's map with a pre-assigned 
sequence id then this all breaks. Consider the following event for two puts 
coming in at the same time for the same row, cf, qual, and timestamp but 
different values (assuming that row locks are reader/writer and normal puts 
only need readers):

* Put #1 comes in gets reader lock
* Put #2 comes in gets reader lock
* Put #1 gets initial sequence id of 1M
* Put #2 gets initial sequence id of 1M+1
* Put #1 inserts into memstore.
* Put #2 inserts into memstore.
* Memstore is now ordered like this [ Put#2, Put#1 ] because 1M+1 sorts before 
1M
* Put #2 appends to the FSWal. With this Put #2 gets real sequence id of 1
* Put #1 appends to the FSWal. With this Put #1 gets real sequence id of 2
* Put#1 and Put#2 are sync'd and the read point is advanced to 2
* the memstore is still sorted like [ Put#2, Put#1 ] even though their real 
sequence id's say they should be sorted the other way.





> Allow RowLock to be reader writer
> ---------------------------------
>
>                 Key: HBASE-12751
>                 URL: https://issues.apache.org/jira/browse/HBASE-12751
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>            Reporter: Elliott Clark
>            Assignee: Nate Edel
>         Attachments: HBASE-12751-v1.patch, HBASE-12751.patch
>
>
> Right now every write operation grabs a row lock. This is to prevent values 
> from changing during a read modify write operation (increment or check and 
> put). However it limits parallelism in several different scenarios.
> If there are several puts to the same row but different columns or stores 
> then this is very limiting.
> If there are puts to the same column then mvcc number should ensure a 
> consistent ordering. So locking is not needed.
> However locking for check and put or increment is still needed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to