[jira] [Commented] (HBASE-14689) Addendum and unit test for HBASE-13471

Enis Soztutar (JIRA) Tue, 17 Nov 2015 20:34:43 -0800

    [ 
https://issues.apache.org/jira/browse/HBASE-14689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15010201#comment-15010201
 ]


Enis Soztutar commented on HBASE-14689:
---------------------------------------

>From the hbase-dev, this seems to cause an issue with row lock timeouts, found 
>in 1.0.3 and 1.1.3RC testing. I was not able to reproduce the row lock 
>timeouts and client hangs using 0.98.16 RC.  

Running single node setup with an SSD disk, and running: 
{code}
bin/hbase pe  --latency --nomapred --presplit=10  randomWrite 10
{code}
reproduces the problem for me easily. 

This is the stack trace reported from handlers, which then gets blocked 
indefinitely: 
{code}
2015-11-17 19:38:04,267 WARN  [B.defaultRpcServer.handler=4,queue=1,port=61707] 
regionserver.HRegion: Failed getting lock in batch put, 
row=00000000000000000000085521
java.io.IOException: Timed out waiting for lock for row: 
00000000000000000000085521
        at 
org.apache.hadoop.hbase.regionserver.HRegion.getRowLockInternal(HRegion.java:3995)
        at 
org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchMutation(HRegion.java:2661)
        at 
org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:2519)
        at 
org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:2473)
        at 
org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:2477)
        at 
org.apache.hadoop.hbase.regionserver.RSRpcServices.doBatchOp(RSRpcServices.java:654)
        at 
org.apache.hadoop.hbase.regionserver.RSRpcServices.doNonAtomicRegionMutation(RSRpcServices.java:618)
        at 
org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:1864)
        at 
org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:31451)
        at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2049)
        at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:111)
        at 
org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:133)
        at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:108)
        at java.lang.Thread.run(Thread.java:745)
{code}

I've reverted the patch in all branches to be on the safe side until we 
understand the issue better. Sorry for the trouble. 

> Addendum and unit test for HBASE-13471
> --------------------------------------
>
>                 Key: HBASE-14689
>                 URL: https://issues.apache.org/jira/browse/HBASE-14689
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Enis Soztutar
>            Assignee: Enis Soztutar
>             Fix For: 2.0.0, 1.2.0, 1.3.0, 1.0.3, 1.1.3, 0.98.16
>
>         Attachments: hbase-14689_v1-branch-1.1.patch, 
> hbase-14689_v1-branch-1.1.patch, hbase-14689_v1.patch
>
>
> One of our customers ran into HBASE-13471, which resulted in all the handlers 
> getting blocked and various other issues. While backporting the issue, I 
> noticed that there is one more case where we might go into infinite loop. In 
> case a row lock cannot be acquired (due to a previous leak for example which 
> we have seen in Phoenix before) this will cause similar infinite loop. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-14689) Addendum and unit test for HBASE-13471

Reply via email to