[ 
https://issues.apache.org/jira/browse/HBASE-2248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-2248:
-------------------------

    Attachment: HBASE-2248.patch

Here is an attempt.  Tests pass.  Posting for review.  Need to do load tests 
yet.

"- Added a (transient) int updateId to KeyValue
- Memstore populates it on Adds and Deletes 
- When a MemstoreScanner is created it grabs the current id (actually 
increments  it to make sure no KV has that same id) and ignores records from 
kvset having an id greater than the one grabbed. Snapshots are scanned in full 
since they're not updated during the scanner's lifetime hence there's no risk 
of partial updates being visible.  There may be an issue with delete's becoming 
partly visible in this scheme, I'll check that later."


> New MemStoreScanner copies memstore for each scan, makes short scans slow
> -------------------------------------------------------------------------
>
>                 Key: HBASE-2248
>                 URL: https://issues.apache.org/jira/browse/HBASE-2248
>             Project: Hadoop HBase
>          Issue Type: Bug
>    Affects Versions: 0.20.3
>            Reporter: Dave Latham
>             Fix For: 0.20.4
>
>         Attachments: HBASE-2248-demonstrate-previous-impl-bugs.patch, 
> hbase-2248.gc, HBASE-2248.patch, Screen shot 2010-02-23 at 10.33.38 AM.png, 
> threads.txt
>
>
> HBASE-2037 introduced a new MemStoreScanner which triggers a 
> ConcurrentSkipListMap.buildFromSorted clone of the memstore and snapshot when 
> starting a scan.
> After upgrading to 0.20.3, we noticed a big slowdown in our use of short 
> scans.  Some of our data repesent a time series.   The data is stored in time 
> series order, MR jobs often insert/update new data at the end of the series, 
> and queries usually have to pick up some or all of the series.  These are 
> often scans of 0-100 rows at a time.  To load one page, we'll observe about 
> 20 such scans being triggered concurrently, and they take 2 seconds to 
> complete.  Doing a thread dump of a region server shows many threads in 
> ConcurrentSkipListMap.biuldFromSorted which traverses the entire map of key 
> values to copy it.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to