[
https://issues.apache.org/jira/browse/HBASE-2248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
stack updated HBASE-2248:
-------------------------
Attachment: HBASE-2248-GetsAsScans3.patch
Here is a different take for review and input on how to solve this issue.
Get is now implemented using Scan. I deleted lots of get-related classes/code
including the QueryMatcher. Deletes are no longer removing KV's from memstore.
The change so on flush we filter deleted KVs is not done in this patch -- can
be done in another issue. Maybe we don't want to filter deleted KVs on flush
but rather on minor compactions, for instance (The axiom that a file hold only
deletes that pertain to values held in storefiles that follow may not be
necessary when gets are implemented using scan?).
Things left to do:
- Performance test
- More accurate heap size calculation for HRegion
- Discuss where/when deletes should be partially applied
Here is more detail on what this change includes:
M
src/contrib/indexed/src/java/org/apache/hadoop/hbase/regionserver/IdxRegion.java
minor tweak due to Memstore#getScanners signature change
M src/java/org/apache/hadoop/hbase/HConstants.java
Appended EMPTY_KEY_VALUE_UPDATE_ID to stand for an unset update id
M src/java/org/apache/hadoop/hbase/KeyValue.java
Added a transient int updateId + accessors + heap size adjustment
Added a createLastOnRow method (similar to create first on row) and
made sure the comparator treats this case symmetrically
M src/java/org/apache/hadoop/hbase/client/Scan.java
Added a constructor which accepts a Get and creates a matching scan +
a convenience method isGetScan
M src/java/org/apache/hadoop/hbase/regionserver/ColumnTracker.java
Modified references to QueryMatcher to refer to ScanQueryMatcher
D src/java/org/apache/hadoop/hbase/regionserver/DeleteCompare.java
M src/java/org/apache/hadoop/hbase/regionserver/ExplicitColumnTracker.java
QueryMatcher -> ScanQueryMatcher
M
src/java/org/apache/hadoop/hbase/regionserver/GetClosestRowBeforeTracker.java
QueryMatcher -> ScanQueryMatcher
D src/java/org/apache/hadoop/hbase/regionserver/GetDeleteTracker.java
M src/java/org/apache/hadoop/hbase/regionserver/HRegion.java
Added a member of type RegionUpdateTracker which is intialized both
in the constructor and every flush + heap size adjustment
Scans are now paused while flushers prepare (e.g. snapshots are taken)
Old gets replaced with new get implementation (which uses scans)
#getClosestRowBefore is now using HRegion#get instead of Store#get
#delete(Delete,Integer,boolean) no longer aquires a newScannerLock
and also tracks update ids using the update tracker
#delete(byte[],List,boolean) protected changed to package since it's
used as an internal HRegion subroutine and accessed a few times by
tests. It's also no longer aquires the update lock
#put no longer aquires newScannerLock also modified to track update ids
RegionScanner stop-row logic was adjusted to support get scans. Also,
RegionUpdateTracker#UpdateIdValidator is now aquired and passed down
to store scanners
M src/java/org/apache/hadoop/hbase/regionserver/KeyValueSkipListSet.java
no longer Cloneable
M src/java/org/apache/hadoop/hbase/regionserver/MemStore.java
deleted lots of unneeded logic, mainly around deletes (very much
simplifed) and gets (no longer needed)
M src/java/org/apache/hadoop/hbase/regionserver/MemStoreScanner.java
Modified to consider UpdateIdValidator for kvset KeyValues. snapshot
kv's are reset to undefined update if for Store#updateColumnValue to
remain backward compatible
D src/java/org/apache/hadoop/hbase/regionserver/QueryMatcher.java
A src/java/org/apache/hadoop/hbase/regionserver/RegionUpdateTracker.java
Trackes updates to HRegions. See javadoc.
M src/java/org/apache/hadoop/hbase/regionserver/ScanDeleteTracker.java
Fixed to throw an exception as comment suggests
M src/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java
Merged with the deleted QueryMatcher. Added a slight variant for get
scans to use 'lastInRows'
M
src/java/org/apache/hadoop/hbase/regionserver/ScanWildcardColumnTracker.java
QueryMatcher -> ScanQueryMatcher
M src/java/org/apache/hadoop/hbase/regionserver/Store.java
#getScanner now accepts an UpdateIdValidator
#get deleted
#updateColumnValue modified to use scans and not memstore#getWithCode
(which was deleted)
D src/java/org/apache/hadoop/hbase/regionserver/StoreFileGetScan.java
M src/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java
QueryMatcher.MatchCode -> ScanQueryMatcher.MatchCode
passing around of the UpdateIdValidator
D src/java/org/apache/hadoop/hbase/regionserver/WildcardColumnTracker.java
M src/test/org/apache/hadoop/hbase/TestKeyValue.java
M src/test/org/apache/hadoop/hbase/client/TestClient.java
M src/test/org/apache/hadoop/hbase/io/TestHeapSize.java
D src/test/org/apache/hadoop/hbase/regionserver/TestDeleteCompare.java
M
src/test/org/apache/hadoop/hbase/regionserver/TestExplicitColumnTracker.java
D src/test/org/apache/hadoop/hbase/regionserver/TestGetDeleteTracker.java
M src/test/org/apache/hadoop/hbase/regionserver/TestHRegion.java
M src/test/org/apache/hadoop/hbase/regionserver/TestMemStore.java
D src/test/org/apache/hadoop/hbase/regionserver/TestQueryMatcher.java
A
src/test/org/apache/hadoop/hbase/regionserver/TestRegionUpdateTracker.java
M
src/test/org/apache/hadoop/hbase/regionserver/TestScanWildcardColumnTracker.java
M src/test/org/apache/hadoop/hbase/regionserver/TestStore.java
D
src/test/org/apache/hadoop/hbase/regionserver/TestWildcardColumnTracker.java
> Provide new non-copy mechanism to assure atomic reads in get and scan
> ---------------------------------------------------------------------
>
> Key: HBASE-2248
> URL: https://issues.apache.org/jira/browse/HBASE-2248
> Project: Hadoop HBase
> Issue Type: Bug
> Affects Versions: 0.20.3
> Reporter: Dave Latham
> Fix For: 0.20.4
>
> Attachments: HBASE-2248-demonstrate-previous-impl-bugs.patch,
> HBASE-2248-GetsAsScans3.patch, HBASE-2248-ryan.patch, hbase-2248.gc,
> HBASE-2248.patch, hbase-2248.txt, readownwrites-lost.2.patch,
> readownwrites-lost.patch, Screen shot 2010-02-23 at 10.33.38 AM.png,
> threads.txt
>
>
> HBASE-2037 introduced a new MemStoreScanner which triggers a
> ConcurrentSkipListMap.buildFromSorted clone of the memstore and snapshot when
> starting a scan.
> After upgrading to 0.20.3, we noticed a big slowdown in our use of short
> scans. Some of our data repesent a time series. The data is stored in time
> series order, MR jobs often insert/update new data at the end of the series,
> and queries usually have to pick up some or all of the series. These are
> often scans of 0-100 rows at a time. To load one page, we'll observe about
> 20 such scans being triggered concurrently, and they take 2 seconds to
> complete. Doing a thread dump of a region server shows many threads in
> ConcurrentSkipListMap.biuldFromSorted which traverses the entire map of key
> values to copy it.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.