Samarth Jain created PHOENIX-4131:
-------------------------------------

             Summary: UngroupedAggregateRegionObserver.preClose() and 
doPostScannerOpen() can deadlock
                 Key: PHOENIX-4131
                 URL: https://issues.apache.org/jira/browse/PHOENIX-4131
             Project: Phoenix
          Issue Type: Bug
            Reporter: Samarth Jain
            Assignee: Samarth Jain


On my local test run I saw that the tests were not completing because the mini 
cluster couldn't shut down. So I took a jstack and discovered the following 
deadlock:

{code}
"RS:0;samarthjai-wsm4:59006" #16265 prio=5 os_prio=31 tid=0x00007fafa6327000 
nid=0x37b3f runnable [0x00007000115f5000]
   java.lang.Thread.State: RUNNABLE
        at java.lang.Object.wait(Native Method)
        at 
org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver.preClose(UngroupedAggregateRegionObserver.java:1201)
        - locked <0x000000072bc406b8> (a java.lang.Object)
        at 
org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost$4.call(RegionCoprocessorHost.java:494)
        at 
org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost$RegionOperation.call(RegionCoprocessorHost.java:1673)
        at 
org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.execOperation(RegionCoprocessorHost.java:1749)
        at 
org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.preClose(RegionCoprocessorHost.java:490)
        at 
org.apache.hadoop.hbase.regionserver.HRegionServer.closeRegion(HRegionServer.java:2843)
        at 
org.apache.hadoop.hbase.regionserver.HRegionServer.closeRegionIgnoreErrors(HRegionServer.java:2805)
        at 
org.apache.hadoop.hbase.regionserver.HRegionServer.closeUserRegions(HRegionServer.java:2423)
        at 
org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:1052)
        at 
org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.runRegionServer(MiniHBaseCluster.java:157)
        at 
org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.access$000(MiniHBaseCluster.java:110)
        at 
org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer$1.run(MiniHBaseCluster.java:141)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:360)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1637)
        at 
org.apache.hadoop.hbase.security.User$SecureHadoopUser.runAs(User.java:334)
        at 
org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.run(MiniHBaseCluster.java:139)
        at java.lang.Thread.run(Thread.java:748)
{code}

{code}
"RpcServer.FifoWFPBQ.default.handler=3,queue=0,port=59006" #16246 daemon prio=5 
os_prio=31 tid=0x00007fafae856000 nid=0x1abdb waiting for monitor entry 
[0x00007000102bc000]
   java.lang.Thread.State: BLOCKED (on object monitor)
        at 
org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver.doPostScannerOpen(UngroupedAggregateRegionObserver.java:734)
        - waiting to lock <0x000000072bc406b8> (a java.lang.Object)
        at 
org.apache.phoenix.coprocessor.BaseScannerRegionObserver$RegionScannerHolder.overrideDelegate(BaseScannerRegionObserver.java:236)
        at 
org.apache.phoenix.coprocessor.BaseScannerRegionObserver$RegionScannerHolder.nextRaw(BaseScannerRegionObserver.java:281)
        at 
org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:2629)
        - locked <0x000000072b625a90> (a 
org.apache.phoenix.coprocessor.BaseScannerRegionObserver$RegionScannerHolder)
        at 
org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:2833)
        at 
org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:34950)
        at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2339)
        at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:123)
        at 
org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:188)
        at 
org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:168)
{code}

preClose() has the object monitor and is waiting for scanReferencesCount to go 
down to 0. doPostScannerOpen() is trying to acquire the same lock so that it 
can reduce the scanReferencesCount to 0.

I think this bug was introduced in PHOENIX-3111 to solve other deadlocks. FYI, 
[~rajeshbabu], [~sergey.soldatov], [~enis], [~lhofhansl].



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to