[
https://issues.apache.org/jira/browse/HBASE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12882878#action_12882878
]
Kannan Muthukkaruppan commented on HBASE-2793:
----------------------------------------------
Still need to look at the code some more. Thinking aloud some options seem to
be:
(Note as background: that we are planning to add HBASE-2265. So, it would be
nice if the fix for this issue also takes advantage of that optimization and
avoids a full row scan).
#1) Filter object with a list of versions you are interested in. But it seems
like in this approach, you'll end up doing a full scan-- and check against the
filter for each row. There wouldn't be a way to early exit.
#2) Variant of #1. Additionally compute the min/max version from the passed in
set of versions; use the code setTimeRange() to trim down the set of columns we
look at; and apply the filter against those columns. Still not a great approach
is versions passed are spread out too much.
#3) Do N point lookups (or 1 column scans), one version at a time (all in the
same server roundtrip of course). I think it is still important to preserve
row-level consistency-- i.e. we should do a consistent read of the all the
versions within a row. The stuff Ryan has done should probably make it easy.
But I don't know this too well yet.
#4) Implement Batch Get[] API. The app would need to pass a List of Get
objects, all for the same row, and use setTimeStamp() to set the version
explicitly in each Get object. The trouble though is that the general case of
the Batch Get[] API doesn't have to support a consistency read across all Gets
in a batch; but for this case a consistent read would be the desired semantics.
I think #3 might be best overall. If there are 10000 versions of a cell, and
you are interested in version 1 and 10000 ones, then point lookups will be as
good as it gets-- and should fetch just the minimal blocks needed. If the
versions happen to be on same block, even better-- the blocks should be warm in
the LRU cache. The case where this approach might not be as CPU efficient is if
the versions are fairly densely packed together, and a range scan (#2) might
have worked better. But for the case the app should probably be using
setTimeRange() API instead.
> Add ability to extract a specified list of versions of a column in a single
> roundtrip
> -------------------------------------------------------------------------------------
>
> Key: HBASE-2793
> URL: https://issues.apache.org/jira/browse/HBASE-2793
> Project: HBase
> Issue Type: New Feature
> Reporter: Kannan Muthukkaruppan
> Assignee: Kannan Muthukkaruppan
>
> In one of the use cases we were looking at, each row contains a single
> column, but with several versions (e.g., each version representing an event
> in a log), and we want to be able to extract specific set of versions from
> the row in a single round-trip.
> Currently, on a Get, one can retrieve a specific version of a column using
> setTimeStamp(ts) or a range of versions using setTimeRange(min, max). But not
> a set of specified versions. It would be useful to add this ability.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.