[jira] Commented: (HBASE-2248) New MemStoreScanner copies memstore for each scan, makes short scans slow

Yoram Kulbak (JIRA) Tue, 23 Feb 2010 19:07:51 -0800

    [ 
https://issues.apache.org/jira/browse/HBASE-2248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12837591#action_12837591
 ]


Yoram Kulbak commented on HBASE-2248:
-------------------------------------

I did the following sanity check: I rolled back memstore to just before 
HBASE-2037 was applied [last commit on 21 Oct 2009]. 
[ To get things going I had to put back the MemStore#numKeyValues method and 
change the  MemStore#clearSnapshot   argument to SortedSet ]

I then ran TestHRegion and two tests failed:
- testFlushCacheWhileScanning - demonstrates the incorrect scans while a 
snapshot exists issue
- testWritesWhileScanning - demonstrates 'partial puts' being visible to the 
scanner
I also tried running TestMemStore but all the tests there have passed. I didn't 
try running the whole suite.

It took me a while to figure out what exactly goes wrong when a snapshot 
exists, the short (and vague) explanation is that the scanner may return keys 
in a 'non ordered' manner, meaning a KV with a higher row  may be returned 
before a KV with a lower row because the result list which aggregates results 
from both snapshot and kvset doesn't guarantee the KVs are added in a sorted 
order. I think there's a way to add a simple test to TestMemStore which will 
demonstrate that..   



> New MemStoreScanner copies memstore for each scan, makes short scans slow
> -------------------------------------------------------------------------
>
>                 Key: HBASE-2248
>                 URL: https://issues.apache.org/jira/browse/HBASE-2248
>             Project: Hadoop HBase
>          Issue Type: Bug
>    Affects Versions: 0.20.3
>            Reporter: Dave Latham
>             Fix For: 0.20.4
>
>         Attachments: hbase-2248.gc, Screen shot 2010-02-23 at 10.33.38 
> AM.png, threads.txt
>
>
> HBASE-2037 introduced a new MemStoreScanner which triggers a 
> ConcurrentSkipListMap.buildFromSorted clone of the memstore and snapshot when 
> starting a scan.
> After upgrading to 0.20.3, we noticed a big slowdown in our use of short 
> scans.  Some of our data repesent a time series.   The data is stored in time 
> series order, MR jobs often insert/update new data at the end of the series, 
> and queries usually have to pick up some or all of the series.  These are 
> often scans of 0-100 rows at a time.  To load one page, we'll observe about 
> 20 such scans being triggered concurrently, and they take 2 seconds to 
> complete.  Doing a thread dump of a region server shows many threads in 
> ConcurrentSkipListMap.biuldFromSorted which traverses the entire map of key 
> values to copy it.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-2248) New MemStoreScanner copies memstore for each scan, makes short scans slow

Reply via email to