[ 
https://issues.apache.org/jira/browse/HBASE-15871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15293312#comment-15293312
 ] 

Jeongdae Kim commented on HBASE-15871:
--------------------------------------

This is jstack captured when this issue occors.

{panel:title=RPC Handler}
"B.defaultRpcServer.handler=188,queue=8,port=16020" daemon prio=10 
tid=0x00007f8f622b9000 nid=0x6a48 runnable [0x00007f8f28306000]
   java.lang.Thread.State: RUNNABLE
        at 
java.util.concurrent.ConcurrentSkipListMap$SubMap$SubMapValueIterator.next(ConcurrentSkipListMap.java:3083)
        at 
org.apache.hadoop.hbase.regionserver.DefaultMemStore$MemStoreScanner.getNext(DefaultMemStore.java:756)
        at 
org.apache.hadoop.hbase.regionserver.DefaultMemStore$MemStoreScanner.seekInSubLists(DefaultMemStore.java:807)
        - locked <0x00000004b5694b68> (a 
org.apache.hadoop.hbase.regionserver.DefaultMemStore$MemStoreScanner)
        at 
org.apache.hadoop.hbase.regionserver.DefaultMemStore$MemStoreScanner.seek(DefaultMemStore.java:799)
        - locked <0x00000004b5694b68> (a 
org.apache.hadoop.hbase.regionserver.DefaultMemStore$MemStoreScanner)
        at 
org.apache.hadoop.hbase.regionserver.DefaultMemStore$MemStoreScanner.seekToPreviousRow(DefaultMemStore.java:981)
        - locked <0x00000004b5694b68> (a 
org.apache.hadoop.hbase.regionserver.DefaultMemStore$MemStoreScanner)
        at 
org.apache.hadoop.hbase.regionserver.DefaultMemStore$MemStoreScanner.backwardSeek(DefaultMemStore.java:950)
        - locked <0x00000004b5694b68> (a 
org.apache.hadoop.hbase.regionserver.DefaultMemStore$MemStoreScanner)
        at 
org.apache.hadoop.hbase.regionserver.ReversedStoreScanner.seekScanners(ReversedStoreScanner.java:83)
        at 
org.apache.hadoop.hbase.regionserver.StoreScanner.resetScannerStack(StoreScanner.java:753)
        at 
org.apache.hadoop.hbase.regionserver.StoreScanner.checkReseek(StoreScanner.java:728)
        at 
org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:488)
        at 
org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:147)
        at 
org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:5502)
        at 
org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:5653)
        at 
org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:5440)
        at 
org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:5426)
{panel}

{panel:title=MemStoreFlusher}
"MemStoreFlusher.1" prio=10 tid=0x00007f8f6269e800 nid=0x6a92 waiting on 
condition [0x00007f8f247cb000]
   java.lang.Thread.State: WAITING (parking)
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0x00000005d3b0ad80> (a 
java.util.concurrent.locks.ReentrantLock$NonfairSync)
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
        at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
        at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:867)
        at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1197)
        at 
java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:214)
        at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:290)
        at 
org.apache.hadoop.hbase.regionserver.StoreScanner.updateReaders(StoreScanner.java:695)
        at 
org.apache.hadoop.hbase.regionserver.HStore.notifyChangedReadersObservers(HStore.java:1091)
        at 
org.apache.hadoop.hbase.regionserver.HStore.updateStorefiles(HStore.java:1070)
        at 
org.apache.hadoop.hbase.regionserver.HStore.access$500(HStore.java:128)
        at 
org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.commit(HStore.java:2246)
        at 
org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2327)
        at 
org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2069)
        at 
org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2031)
        at 
org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:1923)
        at org.apache.hadoop.hbase.regionserver.HRegion.flush(HRegion.java:1849)
        at 
org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:510)
        at 
org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:471)
        at 
org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$800(MemStoreFlusher.java:75)
        at 
org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:259)
        at java.lang.Thread.run(Thread.java:745)
{panel}

> Memstore flush doesn't finish because of backwardseek() in memstore scanner.
> ----------------------------------------------------------------------------
>
>                 Key: HBASE-15871
>                 URL: https://issues.apache.org/jira/browse/HBASE-15871
>             Project: HBase
>          Issue Type: Bug
>          Components: Scanners
>    Affects Versions: 1.1.2
>            Reporter: Jeongdae Kim
>         Attachments: memstore_backwardSeek().PNG
>
>
> Sometimes in our production hbase cluster, it takes a long time to finish 
> memstore flush.( for about more than 30 minutes)
> the reason is that a memstore flusher thread calls 
> StoreScanner.updateReaders(), waits for acquiring a lock that store scanner 
> holds in StoreScanner.next() and backwardseek() in memstore scanner runs for 
> a long time.
> I think that this condition could occur in reverse scan by the following 
> process.
> 1) create a reversed store scanner by requesting a reverse scan.
> 2) flush a memstore in the same HStore.
> 3) puts a lot of cells in memstore and memstore is almost full.
> 4) call the reverse scanner.next() and re-create all scanners in this store 
> because all scanners was already closed by 2)'s flush() and backwardseek() 
> with store's lastTop for all new scanners.
> 5) in this status, memstore is almost full by 2) and all cells in memstore 
> have sequenceID greater than this scanner's readPoint because of 2)'s 
> flush(). this condition causes searching all cells in memstore, and 
> seekToPreviousRow() repeatly seach cells that are already searched if a row 
> has one column. (described this in more detail in a attached file.)
> 6) flush a memstore again in the same HStore, and wait until 4-5) process 
> finished, to update store files in the same HStore after flusing.
> I searched HBase jira. and found a similar issue. (HBASE-14497) but, 
> HBASE-14497's fix can't solve this issue because that fix just changed 
> recursive call to loop.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to