[
https://issues.apache.org/jira/browse/PHOENIX-4131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16144862#comment-16144862
]
Rajeshbabu Chintaguntla commented on PHOENIX-4131:
--------------------------------------------------
[~samarthjain] are you working on it or you want me to take a look?
> UngroupedAggregateRegionObserver.preClose() and doPostScannerOpen() can
> deadlock
> --------------------------------------------------------------------------------
>
> Key: PHOENIX-4131
> URL: https://issues.apache.org/jira/browse/PHOENIX-4131
> Project: Phoenix
> Issue Type: Bug
> Reporter: Samarth Jain
> Assignee: Samarth Jain
>
> On my local test run I saw that the tests were not completing because the
> mini cluster couldn't shut down. So I took a jstack and discovered the
> following deadlock:
> {code}
> "RS:0;samarthjai-wsm4:59006" #16265 prio=5 os_prio=31 tid=0x00007fafa6327000
> nid=0x37b3f runnable [0x00007000115f5000]
> java.lang.Thread.State: RUNNABLE
> at java.lang.Object.wait(Native Method)
> at
> org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver.preClose(UngroupedAggregateRegionObserver.java:1201)
> - locked <0x000000072bc406b8> (a java.lang.Object)
> at
> org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost$4.call(RegionCoprocessorHost.java:494)
> at
> org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost$RegionOperation.call(RegionCoprocessorHost.java:1673)
> at
> org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.execOperation(RegionCoprocessorHost.java:1749)
> at
> org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.preClose(RegionCoprocessorHost.java:490)
> at
> org.apache.hadoop.hbase.regionserver.HRegionServer.closeRegion(HRegionServer.java:2843)
> at
> org.apache.hadoop.hbase.regionserver.HRegionServer.closeRegionIgnoreErrors(HRegionServer.java:2805)
> at
> org.apache.hadoop.hbase.regionserver.HRegionServer.closeUserRegions(HRegionServer.java:2423)
> at
> org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:1052)
> at
> org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.runRegionServer(MiniHBaseCluster.java:157)
> at
> org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.access$000(MiniHBaseCluster.java:110)
> at
> org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer$1.run(MiniHBaseCluster.java:141)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:360)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1637)
> at
> org.apache.hadoop.hbase.security.User$SecureHadoopUser.runAs(User.java:334)
> at
> org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.run(MiniHBaseCluster.java:139)
> at java.lang.Thread.run(Thread.java:748)
> {code}
> {code}
> "RpcServer.FifoWFPBQ.default.handler=3,queue=0,port=59006" #16246 daemon
> prio=5 os_prio=31 tid=0x00007fafae856000 nid=0x1abdb waiting for monitor
> entry [0x00007000102bc000]
> java.lang.Thread.State: BLOCKED (on object monitor)
> at
> org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver.doPostScannerOpen(UngroupedAggregateRegionObserver.java:734)
> - waiting to lock <0x000000072bc406b8> (a java.lang.Object)
> at
> org.apache.phoenix.coprocessor.BaseScannerRegionObserver$RegionScannerHolder.overrideDelegate(BaseScannerRegionObserver.java:236)
> at
> org.apache.phoenix.coprocessor.BaseScannerRegionObserver$RegionScannerHolder.nextRaw(BaseScannerRegionObserver.java:281)
> at
> org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:2629)
> - locked <0x000000072b625a90> (a
> org.apache.phoenix.coprocessor.BaseScannerRegionObserver$RegionScannerHolder)
> at
> org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:2833)
> at
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:34950)
> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2339)
> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:123)
> at
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:188)
> at
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:168)
> {code}
> preClose() has the object monitor and is waiting for scanReferencesCount to
> go down to 0. doPostScannerOpen() is trying to acquire the same lock so that
> it can reduce the scanReferencesCount to 0.
> I think this bug was introduced in PHOENIX-3111 to solve other deadlocks.
> FYI, [~rajeshbabu], [~sergey.soldatov], [~enis], [~lhofhansl].
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)