[
https://issues.apache.org/jira/browse/HBASE-13082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14332025#comment-14332025
]
Lars Hofhansl commented on HBASE-13082:
---------------------------------------
Quick note why this works:
# StoreScanner is passed an explicit object to sync on in updateReaders (it
does not care what this object is, just that it needs to sync on it).
# We the RegionScannerImpl object down as the "sync" object
# All operations that call any StoreScanner method are synchronized already
http://github.com/Xfennec/cvon RegionScannerImpl (except for nextRaw, but that
requires the caller to do the locking himself)
# Now any region scanner operation will prevent the readers from being updated
#4 is much coarser than locking at the StoreScanner object - StoreScanner.peek
is by far the worst, as it is called all over the place. There is no way in
StoreScanner (that I see) that avoids locking every single operation (causing a
memory fence, read and write barrier in this case). As said above, the lock is
almost never contended, the problem are the memory fences, which *kill* multi
core performance.
It leads to the caveat listed above. Very heavy read load can essentially
prevent flushes or compaction from finishing.
But note that this is *already* the case, it is just currently more likely that
the flush/compaction will get through, because the locks are more fine grained.
Checkout StoreScanner.next(List<Cell>), it already holds a lock for the entire
duration of the row fetch. This patch coarsens that to the Scan's batch and up
the region. So reads on other stores can lock out flushes/compactions of a
store.
Also note that compactions usually run a long time, and only need the lock once
to switch the readers around, same for flushes. Need to do testing but I doubt
it's an issue.
Fair locking can help here, but comes with other issues.
I've done local node testing. (local single node HDFS cluster, running single
node HBase on top)
Let me know if the patch is clear. If not, what do I need to change? Worth
doing?
> Coarsen StoreScanner locks to RegionScanner
> -------------------------------------------
>
> Key: HBASE-13082
> URL: https://issues.apache.org/jira/browse/HBASE-13082
> Project: HBase
> Issue Type: Bug
> Reporter: Lars Hofhansl
> Attachments: 13082.txt
>
>
> Continuing where HBASE-10015 left of.
> We can avoid locking (and memory fencing) inside StoreScanner by deferring to
> the lock already held by the RegionScanner.
> In tests this shows quite a scan improvement and reduced CPU (the fences make
> the cores wait for memory fetches).
> There are some drawbacks too:
> * All calls to RegionScanner need to be remain synchronized
> * Implementors of coprocessors need to be diligent in following the locking
> contract. For example Phoenix does not lock RegionScanner.nextRaw() and
> required in the documentation (not picking on Phoenix, this one is my fault
> as I told them it's OK)
> * possible starving of flushes and compaction with heavy read load.
> RegionScanner operations would keep getting the locks and the
> flushes/compactions would not be able finalize the set of files.
> I'll have a patch soon.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)