[jira] [Updated] (HBASE-19163) "Maximum lock count exceeded" from region server's batch processing

huaxiang sun (JIRA) Fri, 19 Jan 2018 11:30:15 -0800

     [ 
https://issues.apache.org/jira/browse/HBASE-19163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


huaxiang sun updated HBASE-19163:
---------------------------------
       Resolution: Fixed
    Fix Version/s: 1.5.0
                   2.0.0
           Status: Resolved  (was: Patch Available)

> "Maximum lock count exceeded" from region server's batch processing
> -------------------------------------------------------------------
>
>                 Key: HBASE-19163
>                 URL: https://issues.apache.org/jira/browse/HBASE-19163
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 3.0.0, 1.2.7, 2.0.0-alpha-3
>            Reporter: huaxiang sun
>            Assignee: huaxiang sun
>            Priority: Major
>             Fix For: 2.0.0, 1.5.0
>
>         Attachments: HBASE-19163-branch-1-v001.patch, 
> HBASE-19163-branch-1-v001.patch, HBASE-19163-master-v001.patch, 
> HBASE-19163.master.001.patch, HBASE-19163.master.002.patch, 
> HBASE-19163.master.004.patch, HBASE-19163.master.005.patch, 
> HBASE-19163.master.006.patch, HBASE-19163.master.007.patch, 
> HBASE-19163.master.008.patch, HBASE-19163.master.009.patch, 
> HBASE-19163.master.009.patch, HBASE-19163.master.010.patch, unittest-case.diff
>
>
> In one of use cases, we found the following exception and replication is 
> stuck.
> {code}
> 2017-10-25 19:41:17,199 WARN  [hconnection-0x28db294f-shared--pool4-t936] 
> client.AsyncProcess: #3, table=foo, attempt=5/5 failed=262836ops, last 
> exception: java.io.IOException: java.io.IOException: Maximum lock count 
> exceeded
>         at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2215)
>         at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:109)
>         at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:185)
>         at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:165)
> Caused by: java.lang.Error: Maximum lock count exceeded
>         at 
> java.util.concurrent.locks.ReentrantReadWriteLock$Sync.fullTryAcquireShared(ReentrantReadWriteLock.java:528)
>         at 
> java.util.concurrent.locks.ReentrantReadWriteLock$Sync.tryAcquireShared(ReentrantReadWriteLock.java:488)
>         at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1327)
>         at 
> java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.tryLock(ReentrantReadWriteLock.java:871)
>         at 
> org.apache.hadoop.hbase.regionserver.HRegion.getRowLock(HRegion.java:5163)
>         at 
> org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchMutation(HRegion.java:3018)
>         at 
> org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:2877)
>         at 
> org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:2819)
>         at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.doBatchOp(RSRpcServices.java:753)
>         at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.doNonAtomicRegionMutation(RSRpcServices.java:715)
>         at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:2148)
>         at 
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:33656)
>         at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2170)
>         ... 3 more
> {code}
> While we are still examining the data pattern, it is sure that there are too 
> many mutations in the batch against the same row, this exceeds the maximum 
> 64k shared lock count and it throws an error and failed the whole batch.
> There are two approaches to solve this issue.
> 1). Let's say there are mutations against the same row in the batch, we just 
> need to acquire the lock once for the same row vs to acquire the lock for 
> each mutation.
> 2). We catch the error and start to process whatever it gets and loop back.
> With HBASE-17924, approach 1 seems easy to implement now. 
> Create the jira and will post update/patch when investigation moving forward.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HBASE-19163) "Maximum lock count exceeded" from region server's batch processing

Reply via email to