[
https://issues.apache.org/jira/browse/HBASE-15900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15319790#comment-15319790
]
Andrew Purtell commented on HBASE-15900:
----------------------------------------
We also saw long stuck flushes or compactions, and blocked region actions,
until we patched our Hadoop for 7005. There have been other fixes for missing
timeouts committed since. We probably should be recommending 2.7.2 or 2.6.4 for
production.
> RS stuck in get lock of HStore
> ------------------------------
>
> Key: HBASE-15900
> URL: https://issues.apache.org/jira/browse/HBASE-15900
> Project: HBase
> Issue Type: Bug
> Affects Versions: 1.1.1, 1.3.0
> Reporter: Heng Chen
> Attachments: 0d32a6bab354e6cc170cd59a2d485797.jstack.txt,
> 0d32a6bab354e6cc170cd59a2d485797.rs.log, 9fe15a52_9fe15a52_save,
> c91324eb_81194e359707acadee2906ffe36ab130.log, dump.txt
>
>
> It happens on my production cluster when i run MR job. I save the dump.txt
> from this RS webUI.
> Many threads stuck here:
> {code}
> Thread 133 (B.defaultRpcServer.handler=94,queue=4,port=16020):
> 32 State: WAITING
> 31 Blocked count: 477816
> 30 Waited count: 535255
> 29 Waiting on
> java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync@6447ba67
> 28 Stack:
> 27 sun.misc.Unsafe.park(Native Method)
> 26 java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
> 25
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
> 24
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:967)
> 23
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1283)
> 22
> java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:727)
> 21 org.apache.hadoop.hbase.regionserver.HStore.add(HStore.java:666)
> 20
> org.apache.hadoop.hbase.regionserver.HRegion.applyFamilyMapToMemstore(HRegion.java:3621)
> 19
> org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchMutation(HRegion.java:3038)
> 18
> org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:2793)
> 17
> org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:2735)
> 16
> org.apache.hadoop.hbase.regionserver.RSRpcServices.doBatchOp(RSRpcServices.java:692)
> 15
> org.apache.hadoop.hbase.regionserver.RSRpcServices.doNonAtomicRegionMutation(RSRpcServices.java:654)
> 14
> org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:2029)
> 13
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32213)
> 12 org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2112)
> 11 org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:101)
> 10
> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130)
> 9 org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107)
> 8 java.lang.Thread.run(Thread.java:745)
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)