[
https://issues.apache.org/jira/browse/HBASE-613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jim Kellerman updated HBASE-613:
--------------------------------
Attachment: 613.patch
HAbstractScanner
- remove HAbstactScanner.iterator() - iterator is not a method on
InternalScanner
HRegion
- make getScanner more efficient by iterating only once to find the stores we
need to scan
- only pass columns relevant to a store to a HStoreScanner
- remove HScanner.iterator() - iterator is not a method on InternalScanner
MemcacheScanner
- never return HConstants.LATEST_TIMESTAMP as the timestamp value for a row.
Instead use the largest timestamp from the cells being returned. This allows a
scanner to determine a timestamp that can be used to fetch the same data again
should new versions be inserted later.
StoreFileScanner
- getNextViableRow would find a row that matched the row key, but did not
consider the requested timestamp. Now if the row it finds has a timestamp
greater than the one desired it advances to determine if a row with a timestamp
less than or equal to the requested one exists since timestamps are sorted
descending.
- removed an unnecessary else
Timestamp
- The program that was used to find the problem and test the fix.
TestScanMultipleVersions
- Test program that fails on current trunk but passes when this patch is
applied.
NOTE: TestHRegionServerExit failed on both Windows and Linux, but
TestRegionRebalancing passed on Linux and failed on Windows.
All other tests passed, and when I ran TestScanMultipleVersions against
unpatched trunk, it failed.
Please review.
> Timestamp-anchored scanning fails to find all records
> -----------------------------------------------------
>
> Key: HBASE-613
> URL: https://issues.apache.org/jira/browse/HBASE-613
> Project: Hadoop HBase
> Issue Type: Bug
> Components: client
> Reporter: stack
> Assignee: Jim Kellerman
> Fix For: 0.2.0
>
> Attachments: 613.patch, nogood.patch, TestTimestampScanning.java,
> Timestamp.patch
>
>
> If I add 3 versions of a cell and then scan across the first set of added
> cells using a timestamp that should only get values from the first upload, a
> bunch are missing (I added 100k on each of the three uploads). I thought it
> the fact that we set the number of cells found back to 1 in HStore when we
> move off current row/column but that doesn't seem to be it. I also tried
> upping the MAX_VERSIONs on my table and that seemed to have no effect. Need
> to look closer.
> Build a unit test because replicating on cluster takes too much time.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.