[
https://issues.apache.org/jira/browse/HBASE-15900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15307450#comment-15307450
]
Heng Chen commented on HBASE-15900:
-----------------------------------
We could see in 0d32a6bab354e6cc170cd59a2d485797.rs.log, the region could not
be closed due to there is one compaction left
{code}
2016-05-31 14:18:18,414 INFO [PriorityRpcServer.handler=19,queue=1,port=16020]
regionserver.RSRpcServices: Close 0d32a6bab354e6cc170cd59a2d485797, moving to
dx-pipe-regionserver8-online,16020,1464675204532
2016-05-31 14:18:18,443 DEBUG
[RS_CLOSE_REGION-dx-pipe-regionserver20-online:16020-1]
handler.CloseRegionHandler: Processing close of
frog_stastic,$+\xD6\x05_{211}_1454335203,1462760599763.0d32a6bab354e6cc170cd59a2d485797.
2016-05-31 14:18:18,443 DEBUG
[RS_CLOSE_REGION-dx-pipe-regionserver20-online:16020-1] regionserver.HRegion:
Closing
frog_stastic,$+\xD6\x05_{211}_1454335203,1462760599763.0d32a6bab354e6cc170cd59a2d485797.:
disabling compactions & flushes
2016-05-31 14:18:18,443 DEBUG
[RS_CLOSE_REGION-dx-pipe-regionserver20-online:16020-1] regionserver.HRegion:
waiting for 1 compactions to complete for region
frog_stastic,$+\xD6\x05_{211}_1454335203,1462760599763.0d32a6bab354e6cc170cd59a2d485797.
{code}
The left compaction is one major compaction run in longCompaction thread, the
progress of it reached 81.23%, then it has no response after that.
{code}
2016-05-31 05:28:29,476 DEBUG
[regionserver/dx-pipe-regionserver20-online/10.11.52.80:16020-longCompactions-1464599181827]
regionserver.HStore: 0d32a6bab354e6cc170cd59a2d485797 - info: Initiating major
compaction (all files)
2016-05-31 05:28:29,476 INFO
[regionserver/dx-pipe-regionserver20-online/10.11.52.80:16020-longCompactions-1464599181827]
regionserver.HRegion: Starting compaction on info in region
frog_stastic,$+\xD6\x05_{211}_1454335203,1462760599763.0d32a6bab354e6cc170cd59a2d485797.
2016-05-31 05:28:29,476 INFO
[regionserver/dx-pipe-regionserver20-online/10.11.52.80:16020-longCompactions-1464599181827]
regionserver.HStore: Starting compaction of 9 file(s) in info of
frog_stastic,$+\xD6\x05_{211}_1454335203,1462760599763.0d32a6bab354e6cc170cd59a2d485797.
into
tmpdir=hdfs://f04/hbase/data/default/frog_stastic/0d32a6bab354e6cc170cd59a2d485797/.tmp,
totalSize=5.7 G
2016-05-31 05:28:29,476 DEBUG
[regionserver/dx-pipe-regionserver20-online/10.11.52.80:16020-longCompactions-1464599181827]
compactions.Compactor: Compacting
hdfs://f04/hbase/data/default/frog_stastic/0d32a6bab354e6cc170cd59a2d485797/info/c31e84594c444d3cb8de00029e24aa34,
keycount=78408609, bloomtype=ROW, size=4.9 G, encoding=NONE, seqNum=77065948,
earliestPutTs=1453899210588
2016-05-31 05:28:29,476 DEBUG
[regionserver/dx-pipe-regionserver20-online/10.11.52.80:16020-longCompactions-1464599181827]
compactions.Compactor: Compacting
hdfs://f04/hbase/data/default/frog_stastic/0d32a6bab354e6cc170cd59a2d485797/info/d3310d547e494e7db2b2e952e5fb412e,
keycount=6356215, bloomtype=ROW, size=428.0 M, encoding=NONE, seqNum=78270525,
earliestPutTs=1463613139775
2016-05-31 05:28:29,477 DEBUG
[regionserver/dx-pipe-regionserver20-online/10.11.52.80:16020-longCompactions-1464599181827]
compactions.Compactor: Compacting
hdfs://f04/hbase/data/default/frog_stastic/0d32a6bab354e6cc170cd59a2d485797/info/8b11d48ac6a241f1b3ace1a822f087b0,
keycount=1850675, bloomtype=ROW, size=124.6 M, encoding=NONE, seqNum=78624661,
earliestPutTs=1464048401491
2016-05-31 05:28:29,477 DEBUG
[regionserver/dx-pipe-regionserver20-online/10.11.52.80:16020-longCompactions-1464599181827]
compactions.Compactor: Compacting
hdfs://f04/hbase/data/default/frog_stastic/0d32a6bab354e6cc170cd59a2d485797/info/514ea98b95eb4b5fa4e122ebdd32d004,
keycount=443135, bloomtype=ROW, size=29.9 M, encoding=NONE, seqNum=78710458,
earliestPutTs=1464306574727
2016-05-31 05:28:29,478 DEBUG
[regionserver/dx-pipe-regionserver20-online/10.11.52.80:16020-longCompactions-1464599181827]
compactions.Compactor: Compacting
hdfs://f04/hbase/data/default/frog_stastic/0d32a6bab354e6cc170cd59a2d485797/info/bf4877dbdf804492878750c30f3cfd8e,
keycount=780345, bloomtype=ROW, size=52.6 M, encoding=NONE, seqNum=78852324,
earliestPutTs=1464307875726
2016-05-31 05:28:29,478 DEBUG
[regionserver/dx-pipe-regionserver20-online/10.11.52.80:16020-longCompactions-1464599181827]
compactions.Compactor: Compacting
hdfs://f04/hbase/data/default/frog_stastic/0d32a6bab354e6cc170cd59a2d485797/info/79ae0770fb58484e866c7b48f84e16a4,
keycount=357040, bloomtype=ROW, size=24.1 M, encoding=NONE, seqNum=78916832,
earliestPutTs=1464391849632
2016-05-31 05:28:29,478 DEBUG
[regionserver/dx-pipe-regionserver20-online/10.11.52.80:16020-longCompactions-1464599181827]
compactions.Compactor: Compacting
hdfs://f04/hbase/data/default/frog_stastic/0d32a6bab354e6cc170cd59a2d485797/info/e7708a72734448c5a86305a0224af683,
keycount=1725115, bloomtype=ROW, size=116.1 M, encoding=NONE, seqNum=79190322,
earliestPutTs=1464393070342
2016-05-31 05:28:29,479 DEBUG
[regionserver/dx-pipe-regionserver20-online/10.11.52.80:16020-longCompactions-1464599181827]
compactions.Compactor: Compacting
hdfs://f04/hbase/data/default/frog_stastic/0d32a6bab354e6cc170cd59a2d485797/info/1dcd54598541492bb20c279716488b5a,
keycount=869015, bloomtype=ROW, size=58.6 M, encoding=NONE, seqNum=79334888,
earliestPutTs=1464588026997
2016-05-31 05:28:29,479 DEBUG
[regionserver/dx-pipe-regionserver20-online/10.11.52.80:16020-longCompactions-1464599181827]
compactions.Compactor: Compacting
hdfs://f04/hbase/data/default/frog_stastic/0d32a6bab354e6cc170cd59a2d485797/info/0a76eb5184dd4d42bdd16a588d295d0e,
keycount=53330, bloomtype=ROW, size=3.7 M, encoding=NONE, seqNum=79346806,
earliestPutTs=1464591201410
2016-05-31 05:29:29,684 DEBUG
[regionserver/dx-pipe-regionserver20-online/10.11.52.80:16020-longCompactions-1464599181827]
compactions.Compactor: Compaction progress:
frog_stastic,$+\xD6\x05_{211}_1454335203,1462760599763.0d32a6bab354e6cc170cd59a2d485797.#info
14047066/90843479 (15.46%), rate=14910.65 kB/sec, throughputController is
NoLimitCompactionThroughputController
2016-05-31 05:30:29,684 DEBUG
[regionserver/dx-pipe-regionserver20-online/10.11.52.80:16020-longCompactions-1464599181827]
compactions.Compactor: Compaction progress:
frog_stastic,$+\xD6\x05_{211}_1454335203,1462760599763.0d32a6bab354e6cc170cd59a2d485797.#info
23487276/90843479 (25.85%), rate=10021.16 kB/sec, throughputController is
NoLimitCompactionThroughputController
2016-05-31 05:31:29,684 DEBUG
[regionserver/dx-pipe-regionserver20-online/10.11.52.80:16020-longCompactions-1464599181827]
compactions.Compactor: Compaction progress:
frog_stastic,$+\xD6\x05_{211}_1454335203,1462760599763.0d32a6bab354e6cc170cd59a2d485797.#info
39884696/90843479 (43.90%), rate=17406.01 kB/sec, throughputController is
NoLimitCompactionThroughputController
2016-05-31 05:32:29,686 DEBUG
[regionserver/dx-pipe-regionserver20-online/10.11.52.80:16020-longCompactions-1464599181827]
compactions.Compactor: Compaction progress:
frog_stastic,$+\xD6\x05_{211}_1454335203,1462760599763.0d32a6bab354e6cc170cd59a2d485797.#info
45639621/90843479 (50.24%), rate=6103.74 kB/sec, throughputController is
NoLimitCompactionThroughputController
2016-05-31 05:33:30,012 DEBUG
[regionserver/dx-pipe-regionserver20-online/10.11.52.80:16020-longCompactions-1464599181827]
compactions.Compactor: Compaction progress:
frog_stastic,$+\xD6\x05_{211}_1454335203,1462760599763.0d32a6bab354e6cc170cd59a2d485797.#info
61288969/90843479 (67.47%), rate=16531.36 kB/sec, throughputController is
NoLimitCompactionThroughputController
2016-05-31 05:34:30,225 DEBUG
[regionserver/dx-pipe-regionserver20-online/10.11.52.80:16020-longCompactions-1464599181827]
compactions.Compactor: Compaction progress:
frog_stastic,$+\xD6\x05_{211}_1454335203,1462760599763.0d32a6bab354e6cc170cd59a2d485797.#info
73794589/90843479 (81.23%), rate=13231.85 kB/sec, throughputController is
NoLimitCompactionThroughputController
{code}
And in jstack, we could see the longCompaction thread is in RUNNABLE state,
but it seems stuck.
{code}
"regionserver/dx-pipe-regionserver20-online/10.11.52.80:16020-longCompactions-1464599181827"
#727 daemon prio=5 os_prio=0 tid=0x00007f5324412000 nid=0x3796 runnable
[0x00007f52fc5a8000]
java.lang.Thread.State: RUNNABLE
at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:79)
at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
- locked <0x00000006b5900d70> (a sun.nio.ch.Util$2)
- locked <0x00000006b58fa2a8> (a java.util.Collections$UnmodifiableSet)
- locked <0x00000006b58d4400> (a sun.nio.ch.EPollSelectorImpl)
at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
at
org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:335)
at
org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:157)
at
org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)
at
org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.readChannelFully(PacketReceiver.java:258)
at
org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doReadFully(PacketReceiver.java:209)
at
org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:171)
at
org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:102)
at
org.apache.hadoop.hdfs.RemoteBlockReader2.readNextPacket(RemoteBlockReader2.java:186)
at
org.apache.hadoop.hdfs.RemoteBlockReader2.read(RemoteBlockReader2.java:146)
- locked <0x00000006b597e8f8> (a
org.apache.hadoop.hdfs.RemoteBlockReader2)
at
org.apache.hadoop.hdfs.DFSInputStream$ByteArrayStrategy.doRead(DFSInputStream.java:686)
at
org.apache.hadoop.hdfs.DFSInputStream.readBuffer(DFSInputStream.java:742)
- eliminated <0x000000062d8aeed8> (a
org.apache.hadoop.hdfs.DFSInputStream)
at
org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:799)
at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:840)
- locked <0x000000062d8aeed8> (a org.apache.hadoop.hdfs.DFSInputStream)
at java.io.DataInputStream.read(DataInputStream.java:149)
at
org.apache.hadoop.hbase.io.hfile.HFileBlock.readWithExtra(HFileBlock.java:678)
at
org.apache.hadoop.hbase.io.hfile.HFileBlock$AbstractFSReader.readAtOffset(HFileBlock.java:1372)
at
org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderImpl.readBlockDataInternal(HFileBlock.java:1591)
at
org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderImpl.readBlockData(HFileBlock.java:1470)
at
org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:437)
at
org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.readNextDataBlock(HFileReaderV2.java:708)
at
org.apache.hadoop.hbase.io.hfile.HFileReaderV2$ScannerV2.isNextBlock(HFileReaderV2.java:833)
at
org.apache.hadoop.hbase.io.hfile.HFileReaderV2$ScannerV2.positionForNextBlock(HFileReaderV2.java:828)
at
org.apache.hadoop.hbase.io.hfile.HFileReaderV2$ScannerV2._next(HFileReaderV2.java:845)
at
org.apache.hadoop.hbase.io.hfile.HFileReaderV2$ScannerV2.next(HFileReaderV2.java:865)
at
org.apache.hadoop.hbase.regionserver.StoreFileScanner.next(StoreFileScanner.java:139)
at
org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:108)
at
org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:596)
at
org.apache.hadoop.hbase.regionserver.compactions.Compactor.performCompaction(Compactor.java:255)
at
org.apache.hadoop.hbase.regionserver.compactions.DefaultCompactor.compact(DefaultCompactor.java:106)
at
org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$DefaultCompactionContext.compact(DefaultStoreEngine.java:112)
at org.apache.hadoop.hbase.regionserver.HStore.compact(HStore.java:1212)
at
org.apache.hadoop.hbase.regionserver.HRegion.compact(HRegion.java:1798)
at
org.apache.hadoop.hbase.regionserver.CompactSplitThread$CompactionRunner.run(CompactSplitThread.java:519)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Locked ownable synchronizers:
- <0x000000062afa5528> (a
java.util.concurrent.ThreadPoolExecutor$Worker)
- <0x000000062f7ee850> (a
java.util.concurrent.locks.ReentrantLock$NonfairSync)
- <0x00000006b5985ec8> (a
java.util.concurrent.locks.ReentrantLock$NonfairSync)
{code}
> RS stuck in get lock of HStore
> ------------------------------
>
> Key: HBASE-15900
> URL: https://issues.apache.org/jira/browse/HBASE-15900
> Project: HBase
> Issue Type: Bug
> Affects Versions: 1.1.1, 1.3.0
> Reporter: Heng Chen
> Attachments: 0d32a6bab354e6cc170cd59a2d485797.jstack.txt,
> 0d32a6bab354e6cc170cd59a2d485797.rs.log, 9fe15a52_9fe15a52_save,
> c91324eb_81194e359707acadee2906ffe36ab130.log, dump.txt
>
>
> It happens on my production cluster when i run MR job. I save the dump.txt
> from this RS webUI.
> Many threads stuck here:
> {code}
> Thread 133 (B.defaultRpcServer.handler=94,queue=4,port=16020):
> 32 State: WAITING
> 31 Blocked count: 477816
> 30 Waited count: 535255
> 29 Waiting on
> java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync@6447ba67
> 28 Stack:
> 27 sun.misc.Unsafe.park(Native Method)
> 26 java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
> 25
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
> 24
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:967)
> 23
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1283)
> 22
> java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:727)
> 21 org.apache.hadoop.hbase.regionserver.HStore.add(HStore.java:666)
> 20
> org.apache.hadoop.hbase.regionserver.HRegion.applyFamilyMapToMemstore(HRegion.java:3621)
> 19
> org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchMutation(HRegion.java:3038)
> 18
> org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:2793)
> 17
> org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:2735)
> 16
> org.apache.hadoop.hbase.regionserver.RSRpcServices.doBatchOp(RSRpcServices.java:692)
> 15
> org.apache.hadoop.hbase.regionserver.RSRpcServices.doNonAtomicRegionMutation(RSRpcServices.java:654)
> 14
> org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:2029)
> 13
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32213)
> 12 org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2112)
> 11 org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:101)
> 10
> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130)
> 9 org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107)
> 8 java.lang.Thread.run(Thread.java:745)
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)