[jira] Updated: (HBASE-2248) Provide new non-copy mechanism to assure atomic reads in get and scan

ryan rawson (JIRA) Mon, 05 Apr 2010 17:52:52 -0700

     [ 
https://issues.apache.org/jira/browse/HBASE-2248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


ryan rawson updated HBASE-2248:
-------------------------------

    Attachment: HBASE-2248-rr-alpha2.txt

Here is my update to my patch, this time I am using iterators to scan the 
memstore and snapshot.  There are a number of fixes to all sorts of fun race 
conditions, etc.

The best news: this is the fastest memstore scanner HBase has seen.  It is 
about 15x faster than the 0.20.3 version based on the microbenchmark included 
in the patch.  The old code takes about 400-500ms to scan 250k KeyValues in 
memstore, and this new patch takes 25-30ms.

I haven't run all the tests yet, but it passes the core TestMemStore and 
TestHRegion which contain all the hard tests that have concurrency.

> Provide new non-copy mechanism to assure atomic reads in get and scan
> ---------------------------------------------------------------------
>
>                 Key: HBASE-2248
>                 URL: https://issues.apache.org/jira/browse/HBASE-2248
>             Project: Hadoop HBase
>          Issue Type: Bug
>    Affects Versions: 0.20.3
>            Reporter: Dave Latham
>            Priority: Blocker
>             Fix For: 0.20.4
>
>         Attachments: HBASE-2248-demonstrate-previous-impl-bugs.patch, 
> HBASE-2248-GetsAsScans3.patch, HBASE-2248-rr-alpha1.txt, 
> HBASE-2248-rr-alpha2.txt, HBASE-2248-ryan.patch, hbase-2248.gc, 
> HBASE-2248.patch, hbase-2248.txt, readownwrites-lost.2.patch, 
> readownwrites-lost.patch, Screen shot 2010-02-23 at 10.33.38 AM.png, 
> threads.txt
>
>
> HBASE-2037 introduced a new MemStoreScanner which triggers a 
> ConcurrentSkipListMap.buildFromSorted clone of the memstore and snapshot when 
> starting a scan.
> After upgrading to 0.20.3, we noticed a big slowdown in our use of short 
> scans.  Some of our data repesent a time series.   The data is stored in time 
> series order, MR jobs often insert/update new data at the end of the series, 
> and queries usually have to pick up some or all of the series.  These are 
> often scans of 0-100 rows at a time.  To load one page, we'll observe about 
> 20 such scans being triggered concurrently, and they take 2 seconds to 
> complete.  Doing a thread dump of a region server shows many threads in 
> ConcurrentSkipListMap.biuldFromSorted which traverses the entire map of key 
> values to copy it.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HBASE-2248) Provide new non-copy mechanism to assure atomic reads in get and scan

Reply via email to