[ 
https://issues.apache.org/jira/browse/PHOENIX-4131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16144862#comment-16144862
 ] 

Rajeshbabu Chintaguntla commented on PHOENIX-4131:
--------------------------------------------------

[~samarthjain] are you working on it or you want me to take  a look?

> UngroupedAggregateRegionObserver.preClose() and doPostScannerOpen() can 
> deadlock
> --------------------------------------------------------------------------------
>
>                 Key: PHOENIX-4131
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-4131
>             Project: Phoenix
>          Issue Type: Bug
>            Reporter: Samarth Jain
>            Assignee: Samarth Jain
>
> On my local test run I saw that the tests were not completing because the 
> mini cluster couldn't shut down. So I took a jstack and discovered the 
> following deadlock:
> {code}
> "RS:0;samarthjai-wsm4:59006" #16265 prio=5 os_prio=31 tid=0x00007fafa6327000 
> nid=0x37b3f runnable [0x00007000115f5000]
>    java.lang.Thread.State: RUNNABLE
>       at java.lang.Object.wait(Native Method)
>       at 
> org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver.preClose(UngroupedAggregateRegionObserver.java:1201)
>       - locked <0x000000072bc406b8> (a java.lang.Object)
>       at 
> org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost$4.call(RegionCoprocessorHost.java:494)
>       at 
> org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost$RegionOperation.call(RegionCoprocessorHost.java:1673)
>       at 
> org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.execOperation(RegionCoprocessorHost.java:1749)
>       at 
> org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.preClose(RegionCoprocessorHost.java:490)
>       at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.closeRegion(HRegionServer.java:2843)
>       at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.closeRegionIgnoreErrors(HRegionServer.java:2805)
>       at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.closeUserRegions(HRegionServer.java:2423)
>       at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:1052)
>       at 
> org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.runRegionServer(MiniHBaseCluster.java:157)
>       at 
> org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.access$000(MiniHBaseCluster.java:110)
>       at 
> org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer$1.run(MiniHBaseCluster.java:141)
>       at java.security.AccessController.doPrivileged(Native Method)
>       at javax.security.auth.Subject.doAs(Subject.java:360)
>       at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1637)
>       at 
> org.apache.hadoop.hbase.security.User$SecureHadoopUser.runAs(User.java:334)
>       at 
> org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.run(MiniHBaseCluster.java:139)
>       at java.lang.Thread.run(Thread.java:748)
> {code}
> {code}
> "RpcServer.FifoWFPBQ.default.handler=3,queue=0,port=59006" #16246 daemon 
> prio=5 os_prio=31 tid=0x00007fafae856000 nid=0x1abdb waiting for monitor 
> entry [0x00007000102bc000]
>    java.lang.Thread.State: BLOCKED (on object monitor)
>       at 
> org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver.doPostScannerOpen(UngroupedAggregateRegionObserver.java:734)
>       - waiting to lock <0x000000072bc406b8> (a java.lang.Object)
>       at 
> org.apache.phoenix.coprocessor.BaseScannerRegionObserver$RegionScannerHolder.overrideDelegate(BaseScannerRegionObserver.java:236)
>       at 
> org.apache.phoenix.coprocessor.BaseScannerRegionObserver$RegionScannerHolder.nextRaw(BaseScannerRegionObserver.java:281)
>       at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:2629)
>       - locked <0x000000072b625a90> (a 
> org.apache.phoenix.coprocessor.BaseScannerRegionObserver$RegionScannerHolder)
>       at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:2833)
>       at 
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:34950)
>       at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2339)
>       at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:123)
>       at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:188)
>       at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:168)
> {code}
> preClose() has the object monitor and is waiting for scanReferencesCount to 
> go down to 0. doPostScannerOpen() is trying to acquire the same lock so that 
> it can reduce the scanReferencesCount to 0.
> I think this bug was introduced in PHOENIX-3111 to solve other deadlocks. 
> FYI, [~rajeshbabu], [~sergey.soldatov], [~enis], [~lhofhansl].



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to