[
https://issues.apache.org/jira/browse/HBASE-17339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15784266#comment-15784266
]
Enis Soztutar commented on HBASE-17339:
---------------------------------------
The main problem is how to determine " ONLY if result is not complete". When
you get the result of a row from memory, it can happen that some other store
file contains a higher version, and you will miss it unless we have the
monotonically increasing timestamps guarantee.
However, we already have min - max timestamps per store file tracked, and we
have logic to eliminate scanners based on min/max timestamps. We can do this
algorithm for correctness:
{code}
1. open all relevant *memory* scanners
2. get results
3. If get returns a result
check the timestamp against all remaining scanners
(KeyValueScanner.shouldUseScanner()). if all (hfile) scanners have less
timestamps, return results.
else
open all scanners
return results
{code}
This will ensure correctness without having to rely on a promise from the user.
> Scan-Memory-First Optimization for Get Operation
> ------------------------------------------------
>
> Key: HBASE-17339
> URL: https://issues.apache.org/jira/browse/HBASE-17339
> Project: HBase
> Issue Type: Improvement
> Reporter: Eshcar Hillel
> Attachments: HBASE-17339-V01.patch
>
>
> The current implementation of a get operation (to retrieve values for a
> specific key) scans through all relevant stores of the region; for each store
> both memory components (memstores segments) and disk components (hfiles) are
> scanned in parallel.
> We suggest to apply an optimization that speculatively scans memory-only
> components first and only if the result is incomplete scans both memory and
> disk.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)