Rajeshbabu Chintaguntla created PHOENIX-3111:
------------------------------------------------

             Summary: Possible Deadlock/delay while building index, upsert 
select, delete rows at server
                 Key: PHOENIX-3111
                 URL: https://issues.apache.org/jira/browse/PHOENIX-3111
             Project: Phoenix
          Issue Type: Bug
            Reporter: Sergio Peleato
            Assignee: Rajeshbabu Chintaguntla
            Priority: Blocker
             Fix For: 4.8.0


There is a possible deadlock while building local index or running upsert 
select, delete at server. The situation might happen in this case.

In the above queries we scan mutations from table and write back to same table 
in that case there is a chance of memstore might reach the threshold of 
blocking memstore size then RegionTooBusyException might be thrown back to 
client and queries might retry scanning.

Let's suppose if we take a local index build index case we first scan from the 
data table and prepare index mutations and write back to same table.
So there is chance of memstore full as well in that case we try to flush the 
region. But if the split happen in between then split might be waiting for 
write lock on the region to close and flush wait for readlock because the write 
lock in the queue until the local index build completed. Local index build 
won't complete because we are not allowed to write until there is flush. This 
might not be complete deadlock situation but the queries might take lot of time 
to complete in this cases.
{noformat}
"regionserver//192.168.0.53:16201-splits-1469165876186" #269 prio=5 os_prio=31 
tid=0x00007f7fb2050800 nid=0x1c033 waiting on condition [0x0000000139b68000]
   java.lang.Thread.State: WAITING (parking)
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0x00000006ede72550> (a 
java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
        at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
        at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
        at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
        at 
java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:943)
        at 
org.apache.hadoop.hbase.regionserver.HRegion.doClose(HRegion.java:1422)
        at org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:1370)
        - locked <0x00000006ede69d00> (a java.lang.Object)
        at 
org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.stepsBeforePONR(SplitTransactionImpl.java:394)
        at 
org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.createDaughters(SplitTransactionImpl.java:278)
        at 
org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.execute(SplitTransactionImpl.java:561)
        at 
org.apache.hadoop.hbase.regionserver.SplitRequest.doSplitting(SplitRequest.java:82)
        at 
org.apache.hadoop.hbase.regionserver.SplitRequest.run(SplitRequest.java:154)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)

   Locked ownable synchronizers:
        - <0x00000006ee132098> (a 
java.util.concurrent.ThreadPoolExecutor$Worker)
{noformat}
{noformat}
"MemStoreFlusher.0" #170 prio=5 os_prio=31 tid=0x00007f7fb6842000 nid=0x19303 
waiting on condition [0x00000001388e9000]
   java.lang.Thread.State: WAITING (parking)
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0x00000006ede72550> (a 
java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
        at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
        at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:967)
        at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1283)
        at 
java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:727)
        at 
org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:1986)
        at org.apache.hadoop.hbase.regionserver.HRegion.flush(HRegion.java:1950)
        at 
org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:501)
        at 
org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:471)
        at 
org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$800(MemStoreFlusher.java:75)
        at 
org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:259)
        at java.lang.Thread.run(Thread.java:745)
{noformat}

As a fix we need to block region splits if building index, upsert select, 
delete rows running at server.

Thanks [~sergey.soldatov] for the help in understanding the bug and analyzing 
it. [~speleato] for finding it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to