[
https://issues.apache.org/jira/browse/HBASE-15900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15303748#comment-15303748
]
Heng Chen commented on HBASE-15900:
-----------------------------------
Check code on HStore.java, the method which may hold write lock list below:
* bulkLoadHFile
* close
* replaceStoreFiles
* snapshot
* updateStorefiles
I can't find any thread which hold the write lock in jstack, but i think it has
relates with compaction. Because there are lots of logs in rs.log like below,
it seems compaction stuck or not run, so flush always be requested.
{code}
2016-05-27 15:25:50,734 INFO
[dx-pipe-regionserver14-online,16020,1464166068185_ChoreService_1]
regionserver.HRegionServer:
dx-pipe-regionserver14-online,16020,1464166068185-MemstoreFlusherChore
requesting flush for region
frog_stastic,J\x08\x7F\x04_{211}_1455521065789,1460472441110.d13dc38891b807c97bfc2cab7fa60f86.
after a delay of 11759
2016-05-27 15:25:50,734 INFO
[dx-pipe-regionserver14-online,16020,1464166068185_ChoreService_1]
regionserver.HRegionServer:
dx-pipe-regionserver14-online,16020,1464166068185-MemstoreFlusherChore
requesting flush for region
frog_stastic,\x0F<\x80\x00_211_1453625257737,1464218845975.56a21606788d0eeff0b268b0bb670841.
after a delay of 10261
2016-05-27 15:25:50,734 INFO
[dx-pipe-regionserver14-online,16020,1464166068185_ChoreService_1]
regionserver.HRegionServer:
dx-pipe-regionserver14-online,16020,1464166068185-MemstoreFlusherChore
requesting flush for region
frog_stastic,\x81n\x8D\x00_{311}_1455203248372,1461067416754.248e5726c8fdd029c61433c7f291eed3.
after a delay of 11702
{code}
> RS stuck in get lock of HStore
> ------------------------------
>
> Key: HBASE-15900
> URL: https://issues.apache.org/jira/browse/HBASE-15900
> Project: HBase
> Issue Type: Bug
> Affects Versions: 1.1.1
> Reporter: Heng Chen
> Attachments: dump.txt
>
>
> It happens on my production cluster when i run MR job. I save the dump.txt
> from this RS webUI.
> Many threads stuck here:
> {code}
> Thread 133 (B.defaultRpcServer.handler=94,queue=4,port=16020):
> 32 State: WAITING
> 31 Blocked count: 477816
> 30 Waited count: 535255
> 29 Waiting on
> java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync@6447ba67
> 28 Stack:
> 27 sun.misc.Unsafe.park(Native Method)
> 26 java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
> 25
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
> 24
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:967)
> 23
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1283)
> 22
> java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:727)
> 21 org.apache.hadoop.hbase.regionserver.HStore.add(HStore.java:666)
> 20
> org.apache.hadoop.hbase.regionserver.HRegion.applyFamilyMapToMemstore(HRegion.java:3621)
> 19
> org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchMutation(HRegion.java:3038)
> 18
> org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:2793)
> 17
> org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:2735)
> 16
> org.apache.hadoop.hbase.regionserver.RSRpcServices.doBatchOp(RSRpcServices.java:692)
> 15
> org.apache.hadoop.hbase.regionserver.RSRpcServices.doNonAtomicRegionMutation(RSRpcServices.java:654)
> 14
> org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:2029)
> 13
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32213)
> 12 org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2112)
> 11 org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:101)
> 10
> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130)
> 9 org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107)
> 8 java.lang.Thread.run(Thread.java:745)
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)