huaxiang sun created HBASE-19163:
------------------------------------

             Summary: "Maximum Lock Acquired" from region server's batch 
processing
                 Key: HBASE-19163
                 URL: https://issues.apache.org/jira/browse/HBASE-19163
             Project: HBase
          Issue Type: Bug
          Components: regionserver
    Affects Versions: 1.2.7
            Reporter: huaxiang sun
            Assignee: huaxiang sun
            Priority: Major


In one of use cases, we found the following exception and replication is stuck.

{code}
2017-10-25 19:41:17,199 WARN  [hconnection-0x28db294f-shared--pool4-t936] 
client.AsyncProcess: #3, table=foo, attempt=5/5 failed=262836ops, last 
exception: java.io.IOException: java.io.IOException: Maximum lock count exceeded
        at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2215)
        at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:109)
        at 
org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:185)
        at 
org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:165)
Caused by: java.lang.Error: Maximum lock count exceeded
        at 
java.util.concurrent.locks.ReentrantReadWriteLock$Sync.fullTryAcquireShared(ReentrantReadWriteLock.java:528)
        at 
java.util.concurrent.locks.ReentrantReadWriteLock$Sync.tryAcquireShared(ReentrantReadWriteLock.java:488)
        at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1327)
        at 
java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.tryLock(ReentrantReadWriteLock.java:871)
        at 
org.apache.hadoop.hbase.regionserver.HRegion.getRowLock(HRegion.java:5163)
        at 
org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchMutation(HRegion.java:3018)
        at 
org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:2877)
        at 
org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:2819)
        at 
org.apache.hadoop.hbase.regionserver.RSRpcServices.doBatchOp(RSRpcServices.java:753)
        at 
org.apache.hadoop.hbase.regionserver.RSRpcServices.doNonAtomicRegionMutation(RSRpcServices.java:715)
        at 
org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:2148)
        at 
org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:33656)
        at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2170)
        ... 3 more

{code}

While we are still examining the data pattern, it is sure that there are too 
many mutations in the batch against the same row, this exceeds the maximum 64k 
shared lock count and it throws an error and failed the whole batch.

There are two approaches to solve this issue.

1). Let's say there are mutations against the same row in the batch, we just 
need to acquire the lock once for the same row vs to acquire the lock for each 
mutation.
2). We catch the error and start to process whatever it gets and loop back.

With HBASE-17924, approach 1 seems easy to implement now. 
Create the jira and will post update/patch when investigation moving forward.





--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to