[
https://issues.apache.org/jira/browse/HBASE-12751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14378104#comment-14378104
]
Elliott Clark commented on HBASE-12751:
---------------------------------------
Sure so all the fun happens in HRegion#doMiniBatchMutation.
I'll talk about line numbers and current implementation though they might
change so I've linked what I currently see.
h2. Normal Flow
* on line
[3039|https://github.com/apache/hbase/blob/a75a2ace4f52daa1eb415f085dfad920b30c3349/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java#L3039]
row locks for all the mutations possible are obtained. In the current code
this means that only one set of mutations for a row can continue at a time.
However if we change that to reduce contention on rows things break farther
down.
* on line
[3109|https://github.com/apache/hbase/blob/a75a2ace4f52daa1eb415f085dfad920b30c3349/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java#L3109]
a pre-assigned read point is assigned 1M + some sequence This sequence is
independent of the actual sequence id. It is entirely possible for this
sequence to be assigned to edits in a different order than they will actually
be applied to the wal log.
* on line
[3141|https://github.com/apache/hbase/blob/a75a2ace4f52daa1eb415f085dfad920b30c3349/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java#L3141]
the mutations are added to the memstore and the memstore adds them to
CellSkipListSet which uses KVComparator.COMPARATOR to sort mutations. They sort
in the pretty normal order of rk, family, qual, timestamp, mvcc number. Since
the sequence Id here is 1M + some these edits sort before anything that's
committed.
* Each edit that's applied to the memstore is also added to the memstoreCells
ArrayList.
* the list of edits applied is packaged into a wal edit.
*
[3182|https://github.com/apache/hbase/blob/a75a2ace4f52daa1eb415f085dfad920b30c3349/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java#L3181]
this waledit is then appended to the wal
* inside of FSHlog
[1191|https://github.com/apache/hbase/blob/a75a2ace4f52daa1eb415f085dfad920b30c3349/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FSHLog.java#L1191]
an FSWalEntry is created and added to the ring buffer.
* at some point the ring buffer calls
[1920|https://github.com/apache/hbase/blob/a75a2ace4f52daa1eb415f085dfad920b30c3349/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FSHLog.java#L1920]
the event handler's FSHlog.RingBufferEventHandler#onEvent method.
* this event handler notices that the event is an append and so it calls
[2026|https://github.com/apache/hbase/blob/a75a2ace4f52daa1eb415f085dfad920b30c3349/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FSHLog.java#L2026]
FSHlog.RingBufferEventHandler#append.
* inside that function at
[2037|https://github.com/apache/hbase/blob/a75a2ace4f52daa1eb415f085dfad920b30c3349/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FSHLog.java#L2037]
we call FSWalEntry#stampRegionSequenceId
* FSWalEntry#stampRegionSequenceId goes through every mutation that is in the
memstore arraylist and changes their sequence id.
[127|https://github.com/apache/hbase/blob/a75a2ace4f52daa1eb415f085dfad920b30c3349/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FSWALEntry.java#L127]
* The mutations in the memstore's CellSkipListSet get a new sequence id.
However the comparator is only run on insert so any change in the sequence
number that would change order isn't reflected in the order that
CellSkipListSet will return key values.
* HRegion#doMiniBatchMutation releases the row locks and continues on.
* the WAL is synced and everything is returned as success.
h2. The Issue
The sequence id from MultiVersionConsistencyControl#getPreAssignedWriteNumber
is completely independent of the sequence id that the ring buffer hands out.
Right now we luck out and never have to deal with this because all edits to the
same row are serialized by the row locks. However if we change it so that more
than one edit for a row can be in the memstore's map with a pre-assigned
sequence id then this all breaks. Consider the following event for two puts
coming in at the same time for the same row, cf, qual, and timestamp but
different values (assuming that row locks are reader/writer and normal puts
only need readers):
* Put #1 comes in gets reader lock
* Put #2 comes in gets reader lock
* Put #1 gets initial sequence id of 1M
* Put #2 gets initial sequence id of 1M+1
* Put #1 inserts into memstore.
* Put #2 inserts into memstore.
* Memstore is now ordered like this [ Put#2, Put#1 ] because 1M+1 sorts before
1M
* Put #2 appends to the FSWal. With this Put #2 gets real sequence id of 1
* Put #1 appends to the FSWal. With this Put #1 gets real sequence id of 2
* Put#1 and Put#2 are sync'd and the read point is advanced to 2
* the memstore is still sorted like [ Put#2, Put#1 ] even though their real
sequence id's say they should be sorted the other way.
> Allow RowLock to be reader writer
> ---------------------------------
>
> Key: HBASE-12751
> URL: https://issues.apache.org/jira/browse/HBASE-12751
> Project: HBase
> Issue Type: Bug
> Components: regionserver
> Reporter: Elliott Clark
> Assignee: Nate Edel
> Attachments: HBASE-12751-v1.patch, HBASE-12751.patch
>
>
> Right now every write operation grabs a row lock. This is to prevent values
> from changing during a read modify write operation (increment or check and
> put). However it limits parallelism in several different scenarios.
> If there are several puts to the same row but different columns or stores
> then this is very limiting.
> If there are puts to the same column then mvcc number should ensure a
> consistent ordering. So locking is not needed.
> However locking for check and put or increment is still needed.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)