[ 
https://issues.apache.org/jira/browse/HBASE-2248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-2248:
-------------------------

    Attachment: HBASE-2248-GetsAsScans3.patch

Here is a different take for review and input on how to solve this issue.  

Get is now implemented using Scan. I deleted lots of get-related classes/code 
including the QueryMatcher. Deletes are no longer removing KV's from memstore.  
The change so on flush we filter deleted KVs is not done in this patch -- can 
be done in another issue.  Maybe we don't want to filter deleted KVs on flush 
but rather on minor compactions, for instance (The axiom that a file hold only 
deletes that pertain to values held in storefiles that follow may not be 
necessary when gets are implemented using scan?).



Things left to do:
- Performance test
- More accurate heap size calculation for HRegion
- Discuss where/when deletes should be partially applied

Here is more detail on what this change includes:

M       
src/contrib/indexed/src/java/org/apache/hadoop/hbase/regionserver/IdxRegion.java
 minor tweak due to Memstore#getScanners signature change

M       src/java/org/apache/hadoop/hbase/HConstants.java
 Appended EMPTY_KEY_VALUE_UPDATE_ID to stand for an unset update id

M       src/java/org/apache/hadoop/hbase/KeyValue.java
 Added a transient int updateId + accessors + heap size adjustment
 Added a createLastOnRow method (similar to create first on row) and
made sure the comparator treats this case symmetrically

M       src/java/org/apache/hadoop/hbase/client/Scan.java
 Added a constructor which accepts a Get and creates a matching scan +
a convenience method isGetScan

M       src/java/org/apache/hadoop/hbase/regionserver/ColumnTracker.java
 Modified references to QueryMatcher to refer to ScanQueryMatcher

D       src/java/org/apache/hadoop/hbase/regionserver/DeleteCompare.java
M       src/java/org/apache/hadoop/hbase/regionserver/ExplicitColumnTracker.java
 QueryMatcher -> ScanQueryMatcher

M       
src/java/org/apache/hadoop/hbase/regionserver/GetClosestRowBeforeTracker.java
 QueryMatcher -> ScanQueryMatcher

D       src/java/org/apache/hadoop/hbase/regionserver/GetDeleteTracker.java
M       src/java/org/apache/hadoop/hbase/regionserver/HRegion.java
 Added a member of type RegionUpdateTracker which is intialized both
in the constructor and every flush + heap size adjustment
 Scans are now paused while flushers prepare (e.g. snapshots are taken)
 Old gets replaced with new get implementation (which uses scans)
 #getClosestRowBefore is now using HRegion#get instead of Store#get
 #delete(Delete,Integer,boolean) no longer aquires a newScannerLock
and also tracks update ids using the update tracker
 #delete(byte[],List,boolean) protected changed to package since it's
used as an internal HRegion subroutine and accessed a few times by
tests. It's also no longer aquires the update lock
 #put no longer aquires newScannerLock also modified to track update ids
 RegionScanner stop-row logic was adjusted to support get scans. Also,
RegionUpdateTracker#UpdateIdValidator is now aquired and passed down
to store scanners

M       src/java/org/apache/hadoop/hbase/regionserver/KeyValueSkipListSet.java
 no longer Cloneable

M       src/java/org/apache/hadoop/hbase/regionserver/MemStore.java
 deleted lots of unneeded logic, mainly around deletes (very much
simplifed) and gets (no longer needed)

M       src/java/org/apache/hadoop/hbase/regionserver/MemStoreScanner.java
 Modified to consider UpdateIdValidator for kvset KeyValues. snapshot
kv's are reset to undefined update if for Store#updateColumnValue to
remain backward compatible

D       src/java/org/apache/hadoop/hbase/regionserver/QueryMatcher.java
A       src/java/org/apache/hadoop/hbase/regionserver/RegionUpdateTracker.java
 Trackes updates to HRegions. See javadoc.

M       src/java/org/apache/hadoop/hbase/regionserver/ScanDeleteTracker.java
 Fixed to throw an exception as comment suggests

M       src/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java
 Merged with the deleted QueryMatcher. Added a slight variant for get
scans to use 'lastInRows'

M       
src/java/org/apache/hadoop/hbase/regionserver/ScanWildcardColumnTracker.java
 QueryMatcher -> ScanQueryMatcher

M       src/java/org/apache/hadoop/hbase/regionserver/Store.java
 #getScanner now accepts an UpdateIdValidator

 #get deleted
 #updateColumnValue modified to use scans and not memstore#getWithCode
(which was deleted)
D       src/java/org/apache/hadoop/hbase/regionserver/StoreFileGetScan.java
M       src/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java
 QueryMatcher.MatchCode -> ScanQueryMatcher.MatchCode
 passing around of the UpdateIdValidator

D       src/java/org/apache/hadoop/hbase/regionserver/WildcardColumnTracker.java
M       src/test/org/apache/hadoop/hbase/TestKeyValue.java
M       src/test/org/apache/hadoop/hbase/client/TestClient.java
M       src/test/org/apache/hadoop/hbase/io/TestHeapSize.java
D       src/test/org/apache/hadoop/hbase/regionserver/TestDeleteCompare.java
M       
src/test/org/apache/hadoop/hbase/regionserver/TestExplicitColumnTracker.java
D       src/test/org/apache/hadoop/hbase/regionserver/TestGetDeleteTracker.java
M       src/test/org/apache/hadoop/hbase/regionserver/TestHRegion.java
M       src/test/org/apache/hadoop/hbase/regionserver/TestMemStore.java
D       src/test/org/apache/hadoop/hbase/regionserver/TestQueryMatcher.java
A       
src/test/org/apache/hadoop/hbase/regionserver/TestRegionUpdateTracker.java
M       
src/test/org/apache/hadoop/hbase/regionserver/TestScanWildcardColumnTracker.java
M       src/test/org/apache/hadoop/hbase/regionserver/TestStore.java
D       
src/test/org/apache/hadoop/hbase/regionserver/TestWildcardColumnTracker.java

> Provide new non-copy mechanism to assure atomic reads in get and scan
> ---------------------------------------------------------------------
>
>                 Key: HBASE-2248
>                 URL: https://issues.apache.org/jira/browse/HBASE-2248
>             Project: Hadoop HBase
>          Issue Type: Bug
>    Affects Versions: 0.20.3
>            Reporter: Dave Latham
>             Fix For: 0.20.4
>
>         Attachments: HBASE-2248-demonstrate-previous-impl-bugs.patch, 
> HBASE-2248-GetsAsScans3.patch, HBASE-2248-ryan.patch, hbase-2248.gc, 
> HBASE-2248.patch, hbase-2248.txt, readownwrites-lost.2.patch, 
> readownwrites-lost.patch, Screen shot 2010-02-23 at 10.33.38 AM.png, 
> threads.txt
>
>
> HBASE-2037 introduced a new MemStoreScanner which triggers a 
> ConcurrentSkipListMap.buildFromSorted clone of the memstore and snapshot when 
> starting a scan.
> After upgrading to 0.20.3, we noticed a big slowdown in our use of short 
> scans.  Some of our data repesent a time series.   The data is stored in time 
> series order, MR jobs often insert/update new data at the end of the series, 
> and queries usually have to pick up some or all of the series.  These are 
> often scans of 0-100 rows at a time.  To load one page, we'll observe about 
> 20 such scans being triggered concurrently, and they take 2 seconds to 
> complete.  Doing a thread dump of a region server shows many threads in 
> ConcurrentSkipListMap.biuldFromSorted which traverses the entire map of key 
> values to copy it.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to