[jira] Commented: (HBASE-2793) Add ability to extract a specified list of versions of a column in a single roundtrip

Kannan Muthukkaruppan (JIRA) Sat, 26 Jun 2010 11:45:12 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12882878#action_12882878
 ]


Kannan Muthukkaruppan commented on HBASE-2793:
----------------------------------------------

Still need to look at the code some more. Thinking aloud some options seem to 
be:

(Note as background: that we are planning to add HBASE-2265. So, it would be 
nice if the fix for this issue also takes advantage of that optimization and 
avoids a full row scan).

#1) Filter object with a list of versions you are interested in. But it seems 
like in this approach, you'll end up doing a full scan-- and check against the 
filter for each row. There wouldn't be a way to early exit.

#2) Variant of #1. Additionally compute the min/max version from the passed in 
set of versions; use the code setTimeRange() to trim down the set of columns we 
look at; and apply the filter against those columns. Still not a great approach 
is versions passed are spread out too much.

#3) Do N point lookups (or 1 column scans), one version at a time (all in the 
same server roundtrip of course). I think it is still important to preserve 
row-level consistency-- i.e. we should do a consistent read of the all the 
versions within a row. The stuff Ryan has done should probably make it easy. 
But I don't know this too well yet.

#4) Implement Batch Get[] API. The app would need to pass a List of Get 
objects, all for the same row, and use setTimeStamp() to set the version 
explicitly in each Get object. The trouble though is that the general case of 
the Batch Get[] API doesn't have to support a consistency read across all Gets 
in a batch; but for this case a consistent read would be the desired semantics.

I think #3 might be best overall. If there are 10000 versions of a cell, and 
you are interested in version 1 and 10000 ones, then point lookups will be as 
good as it gets-- and should fetch just the minimal blocks needed.  If the 
versions happen to be on same block, even better-- the blocks should be warm in 
the LRU cache. The case where this approach might not be as CPU efficient is if 
the versions are fairly densely packed together, and a range scan (#2) might 
have worked better. But for the case the app should probably be using 
setTimeRange() API instead.




> Add ability to extract a specified list of versions of a column in a single 
> roundtrip
> -------------------------------------------------------------------------------------
>
>                 Key: HBASE-2793
>                 URL: https://issues.apache.org/jira/browse/HBASE-2793
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Kannan Muthukkaruppan
>            Assignee: Kannan Muthukkaruppan
>
> In one of the use cases we were looking at, each row contains a single 
> column, but with several versions (e.g., each version representing an event 
> in a log), and we want to be able to extract specific set of versions from 
> the row in a single round-trip.
> Currently, on a Get, one can retrieve a specific version of a column using 
> setTimeStamp(ts) or a range of versions using setTimeRange(min, max). But not 
> a set of specified versions. It would be useful to add this ability.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-2793) Add ability to extract a specified list of versions of a column in a single roundtrip

Reply via email to