[
https://issues.apache.org/jira/browse/HBASE-19163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
huaxiang sun updated HBASE-19163:
---------------------------------
Resolution: Fixed
Fix Version/s: 1.5.0
2.0.0
Status: Resolved (was: Patch Available)
> "Maximum lock count exceeded" from region server's batch processing
> -------------------------------------------------------------------
>
> Key: HBASE-19163
> URL: https://issues.apache.org/jira/browse/HBASE-19163
> Project: HBase
> Issue Type: Bug
> Components: regionserver
> Affects Versions: 3.0.0, 1.2.7, 2.0.0-alpha-3
> Reporter: huaxiang sun
> Assignee: huaxiang sun
> Priority: Major
> Fix For: 2.0.0, 1.5.0
>
> Attachments: HBASE-19163-branch-1-v001.patch,
> HBASE-19163-branch-1-v001.patch, HBASE-19163-master-v001.patch,
> HBASE-19163.master.001.patch, HBASE-19163.master.002.patch,
> HBASE-19163.master.004.patch, HBASE-19163.master.005.patch,
> HBASE-19163.master.006.patch, HBASE-19163.master.007.patch,
> HBASE-19163.master.008.patch, HBASE-19163.master.009.patch,
> HBASE-19163.master.009.patch, HBASE-19163.master.010.patch, unittest-case.diff
>
>
> In one of use cases, we found the following exception and replication is
> stuck.
> {code}
> 2017-10-25 19:41:17,199 WARN [hconnection-0x28db294f-shared--pool4-t936]
> client.AsyncProcess: #3, table=foo, attempt=5/5 failed=262836ops, last
> exception: java.io.IOException: java.io.IOException: Maximum lock count
> exceeded
> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2215)
> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:109)
> at
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:185)
> at
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:165)
> Caused by: java.lang.Error: Maximum lock count exceeded
> at
> java.util.concurrent.locks.ReentrantReadWriteLock$Sync.fullTryAcquireShared(ReentrantReadWriteLock.java:528)
> at
> java.util.concurrent.locks.ReentrantReadWriteLock$Sync.tryAcquireShared(ReentrantReadWriteLock.java:488)
> at
> java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1327)
> at
> java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.tryLock(ReentrantReadWriteLock.java:871)
> at
> org.apache.hadoop.hbase.regionserver.HRegion.getRowLock(HRegion.java:5163)
> at
> org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchMutation(HRegion.java:3018)
> at
> org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:2877)
> at
> org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:2819)
> at
> org.apache.hadoop.hbase.regionserver.RSRpcServices.doBatchOp(RSRpcServices.java:753)
> at
> org.apache.hadoop.hbase.regionserver.RSRpcServices.doNonAtomicRegionMutation(RSRpcServices.java:715)
> at
> org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:2148)
> at
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:33656)
> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2170)
> ... 3 more
> {code}
> While we are still examining the data pattern, it is sure that there are too
> many mutations in the batch against the same row, this exceeds the maximum
> 64k shared lock count and it throws an error and failed the whole batch.
> There are two approaches to solve this issue.
> 1). Let's say there are mutations against the same row in the batch, we just
> need to acquire the lock once for the same row vs to acquire the lock for
> each mutation.
> 2). We catch the error and start to process whatever it gets and loop back.
> With HBASE-17924, approach 1 seems easy to implement now.
> Create the jira and will post update/patch when investigation moving forward.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)