[jira] [Commented] (HBASE-13082) Coarsen StoreScanner locks to RegionScanner

Lars Hofhansl (JIRA) Sat, 21 Feb 2015 21:11:34 -0800

    [ 
https://issues.apache.org/jira/browse/HBASE-13082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14332025#comment-14332025
 ]


Lars Hofhansl commented on HBASE-13082:
---------------------------------------

Quick note why this works:
# StoreScanner is passed an explicit object to sync on in updateReaders (it 
does not care what this object is, just that it needs to sync on it).
# We the RegionScannerImpl object down as the "sync" object
# All operations that call any StoreScanner method are synchronized already 
http://github.com/Xfennec/cvon RegionScannerImpl (except for nextRaw, but that 
requires the caller to do the locking himself)
# Now any region scanner operation will prevent the readers from being updated

#4 is much coarser than locking at the StoreScanner object - StoreScanner.peek 
is by far the worst, as it is called all over the place. There is no way in 
StoreScanner (that I see) that avoids locking every single operation (causing a 
memory fence, read and write barrier in this case). As said above, the lock is 
almost never contended, the problem are the memory fences, which *kill* multi 
core performance.

It leads to the caveat listed above. Very heavy read load can essentially 
prevent flushes or compaction from finishing.
But note that this is *already* the case, it is just currently more likely that 
the flush/compaction will get through, because the locks are more fine grained. 
Checkout StoreScanner.next(List<Cell>), it already holds a lock for the entire 
duration of the row fetch. This patch coarsens that to the Scan's batch and up 
the region. So reads on other stores can lock out flushes/compactions of a 
store.
Also note that compactions usually run a long time, and only need the lock once 
to switch the readers around, same for flushes. Need to do testing but I doubt 
it's an issue.

Fair locking can help here, but comes with other issues.

I've done local node testing. (local single node HDFS cluster, running single 
node HBase on top)

Let me know if the patch is clear. If not, what do I need to change? Worth 
doing?


> Coarsen StoreScanner locks to RegionScanner
> -------------------------------------------
>
>                 Key: HBASE-13082
>                 URL: https://issues.apache.org/jira/browse/HBASE-13082
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>         Attachments: 13082.txt
>
>
> Continuing where HBASE-10015 left of.
> We can avoid locking (and memory fencing) inside StoreScanner by deferring to 
> the lock already held by the RegionScanner.
> In tests this shows quite a scan improvement and reduced CPU (the fences make 
> the cores wait for memory fetches).
> There are some drawbacks too:
> * All calls to RegionScanner need to be remain synchronized
> * Implementors of coprocessors need to be diligent in following the locking 
> contract. For example Phoenix does not lock RegionScanner.nextRaw() and 
> required in the documentation (not picking on Phoenix, this one is my fault 
> as I told them it's OK)
> * possible starving of flushes and compaction with heavy read load. 
> RegionScanner operations would keep getting the locks and the 
> flushes/compactions would not be able finalize the set of files.
> I'll have a patch soon.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-13082) Coarsen StoreScanner locks to RegionScanner

Reply via email to