[jira] [Commented] (HBASE-18144) Forward-port the old exclusive row lock; there are scenarios where it performs better

stack (JIRA) Wed, 31 May 2017 23:11:23 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-18144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16032513#comment-16032513
 ]


stack commented on HBASE-18144:
-------------------------------

h1. Setup

In hbase-1.1.x and earlier, hbase took an exclusive lock on the WHOLE row 
whether a read, write, or increment (read-modify-write). Absolute 
single-threaded access crassly ensured we returned a consistent view. The 
hbase-1.1.x system was also indiscriminate about which queued handler next got 
access; i.e it was not ‘fair’.

In hbase-1.2.x, the hbase row lock was moved to use a reentrant read/write 
native java lock as its core (HBASE-12751); we took the read lock whenever we 
read or mutated the row – MVCC ensured consistent view while the row was under 
modification – and the write lock was taken only for the usually ‘rare’ 
checkAndPut type operation; i.e. checkAndPut, checkAndDelete, Increment, and 
Append. The non-exclusive access to the Row upped throughput (in part because 
we were also able to simplify our read/write path using the new approach). The 
fairness helped latencies. hbase-1.2.x has been out for a good while now. No 
one up to this to my knowledge has complained about they new-style-locking.

What happens though when you have a loading that is mostly increments all on a 
single row; i.e. lots of write exclusive locks concurrent with lots of read 
locks?

We found that the old exclusive lock performs way better than hbase-1.2.x 
read/write locking at least in one extreme, so much so, we need to bring it 
back as a option if only so we can carry such 'extreme' increment-type 
workloads on  newer hbases (we should work on getting the write-only-increment 
working as first class in hbase, but that is for another issue). This issue is 
about adding back exclusive lock as a table option.

h1. Problem

In essence, at a users deploy, all threads in variations of the below:
{code}
"RpcServer.FifoWFPBQ.default.handler=190,queue=10,port=60020" #243 daemon 
prio=5 os_prio=0 tid=0x00007fbb58691800 nid=0x2d2527 waiting on condition 
[0x00007fbb4ca49000]
   java.lang.Thread.State: TIMED_WAITING (parking)
  at sun.misc.Unsafe.park(Native Method)
  - parking to wait for  <0x00000007c6001b38> (a 
java.util.concurrent.locks.ReentrantReadWriteLock$FairSync)
  at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
  at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireNanos(AbstractQueuedSynchronizer.java:934)
  at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireNanos(AbstractQueuedSynchronizer.java:1247)
  at 
java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.tryLock(ReentrantReadWriteLock.java:1115)
  at 
org.apache.hadoop.hbase.regionserver.HRegion.getRowLockInternal(HRegion.java:5171)
  at org.apache.hadoop.hbase.regionserver.HRegion.doIncrement(HRegion.java:7453)
{code}
Or this:
{code}
"RpcServer.FifoWFPBQ.default.handler=180,queue=0,port=60020" #233 daemon prio=5 
os_prio=0 tid=0x00007fbb586ed800 nid=0x2d251d waiting on condition 
[0x00007fbb4d453000]
   java.lang.Thread.State: TIMED_WAITING (parking)
  at sun.misc.Unsafe.park(Native Method)
  - parking to wait for  <0x0000000354976c00> (a 
java.util.concurrent.locks.ReentrantReadWriteLock$FairSync)
  at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
  at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1037)
  at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328)
  at 
java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.tryLock(ReentrantReadWriteLock.java:871)
  at 
org.apache.hadoop.hbase.regionserver.HRegion.getRowLockInternal(HRegion.java:5171)
  at 
org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchMutation(HRegion.java:3017)
…
{code}
All threads are stuck trying to get the row lock.

Various diggings had it that all 200 handlers (200 in this case) were NOT 
trying to get to the same row but that there were about 16 rows being addressed 
with varying numbers of handlers backed up (from 50 handlers in one case down 
to 10 or so at the other extreme).

It looks like a deadlock. You’d have expected someone of the handlers was 
inside doing the lock but in this thead dump, there were none.

Thread dumps would stay in this stasis for minutes. Progress seemed to be going 
on because the players changed across thread dumps (we figured the thread dump 
at ‘safe point’ was not giving us a true picture of the actual state of 
affairs). Eventually we got over the hump and the RegionServer came alive again.

Looking at the same load up on an hbase-1.0.x, we’d see handlers stuck trying 
to get locks but only about 1/5th of all handlers would be stuck on trying to 
get locks at peak.

h1. Investigation

It took a while to arrive back at locking as the culprit. The tea leaves as 
noted above pointed at locking but we wouldn’t buy it: “hbase-1.2.x has been 
shipping for ages…”, “Measurement shows the new locking taking 10-20% longer 
but lock-acquisition-time is so short, it probably immeasurable”, etc.
A big problem was our inability at being able to faithfully reproduce the 
loading locally ([~esteban] and I made distinct attempts in two different 
environments). While eventually in simulations we could reproduce the ‘weird’ 
thread dump the user was seeing,1.2.x always had overall better throughput than 
1.0.x in our simulations (lock-acquisition-time was indeed longer in 1,2,x... 
by about 20%).

Here are some of the paths we took following the wild goose (we spent a good 
while on this issue – here is a sketch of what we were at):

h2. Row Key Hash

We spent time on the key used to finger locks. It was made of a hash of the row 
key bytes. The hashing algorithm is poor. We speculated clashes were broadening 
the scope of the lock or there was a scenario where we obtained the lock with 
one hash but a hash that clashed somehow unlocked another leaving locks in 
place.

Interesting findings included the fact that hashing took longer the longer the 
row key was (User keys wer of 150-200 characters).

When we found that row key hash in 1.1.x was same as in 1.2.x, this put an end 
to this particular pursuit.

h2. Compare

We expended a bunch of effort comparing locking in 1.0.x and 1.2.x. We measured 
how long lock acquisition was taking (fair vs unfair, exclusive vs read/write) 
and how many locks were being taken. It looked like 1.2.x was taking many more 
than 1.0.x. Comparing code bases, it looked liked we’d dropped an optimization 
going to 1.2 here:

{code}
commit b15a05d7b611726ff9b2dd1a3c5c9448b41e1474
Author: Enis Soztutar <e...@apache.org>
Date:   Wed Oct 28 18:16:32 2015 -0700

        HBASE-14689 Addendum and unit test for HBASE-13471
{code}
… a patch that was subsequently messily reverted then backported here:
{code}
commit bae2a970817d3693946c5b1d4e2382e56b1e94b8
Author: stack <st...@apache.org>
Date:   Wed Sep 30 11:48:46 2015 -0700

        HBASE-14465 Backport 'Allow rowlock to be reader/write' to branch-1

        // If we haven't got any rows in our batch, we should block to
        // get the next one.
-       boolean shouldBlock = numReadyToWrite == 0;
        RowLock rowLock = null;
        try {
-       rowLock = getRowLockInternal(mutation.getRow(), shouldBlock);
+       rowLock = getRowLock(mutation.getRow(), true);
        } catch (IOException ioe) {
        LOG.warn("Failed getting lock in batch put, row="
                + Bytes.toStringBinary(mutation.getRow()), ioe);
        }
        if (rowLock == null) {
        // We failed to grab another lock
-       assert !shouldBlock : "Should never fail to get lock when blocking";
+       assert false: "Should never fail to get lock when blocking";

{code}
Turned out we were counting locking incorrectly and there was a 
misunderstanding around lock aggregation (HBASE-18074 ).

We even compared running on different JVM versions.

h2. Improving Lock-access time

Idea was to improve 1.2 locking regime. Did profiling and then experiments, 
refactors of lock acquisition making do w/ less Atomics and volatiles, and 
investigation of row sorting so can take lock once for a group of edits…. Gave 
up on this route after seeing how little improvement we could make and after 
figuring that this code that originally came from FB had had a lot of thought 
applied already so pickings would be slim.

h1. Culprit: ReentrantReadWriteLock performance goes south if writes
RRWL is from circa 2005 (JDK5?). No good looking in the bug database for issues 
in 2017. But from the javadoc:
"ReentrantReadWriteLocks can be used to improve concurrency in some uses of 
some kinds of Collections. This is typically worthwhile only when the 
collections are expected to be large, and entail operations with overhead that 
outweighs synchronization overhead."

And “...if updates become frequent then the data spends most of its time being 
exclusively locked and there is little, if any increase in concurrency.”

Or Ben Manes in 
https://groups.google.com/forum/#!topic/guava-discuss/Vqu34OODWdw "Read/Write 
(shared) locks are expensive to maintain, much more so than exclusive locks. If 
the lock hold time is small then most often a regular lock (synchronized or 
ReentrantLock) is faster. It is only when the lock hold time is large does the 
benefit of a shared lock become apparent."

This article on RWLock being AWFUL -- 
https://www.javacodegeeks.com/2014/06/java-8-stampedlocks-vs-readwritelocks-and-synchronized.html
 -- gets dinged in the comments section for its testing technique only the 
setup seemed to resemble the user-under-scrutiny loading (hundreds of 
concurrent threads w/ a good portion taking write locks); i.e. a read from 
hbase can turn into 1k Puts followed by 1k Increments (IIUC) all on the one row.

So, decided to try forward porting the old mechanism, an exclusive (reentrant) 
lock rather than a RRWL as our locking basis.

Testing though in my setup, the exclusive lock seemed to perform worse but 
deploying the patch at user production, it does much better than the 1.2.x 
default locking regime.

Let me forward port the old locking mechanism in case anyone else has a 
workload like the above users.

> Forward-port the old exclusive row lock; there are scenarios where it 
> performs better
> -------------------------------------------------------------------------------------
>
>                 Key: HBASE-18144
>                 URL: https://issues.apache.org/jira/browse/HBASE-18144
>             Project: HBase
>          Issue Type: Bug
>          Components: Increment
>    Affects Versions: 1.2.5
>            Reporter: stack
>            Assignee: stack
>             Fix For: 2.0.0, 1.3.2, 1.2.7
>
>
> Description to follow.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HBASE-18144) Forward-port the old exclusive row lock; there are scenarios where it performs better

Reply via email to