[
https://issues.apache.org/jira/browse/HBASE-33?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12609318#action_12609318
]
Jim Kellerman commented on HBASE-33:
------------------------------------
Implementing a scanner that returns multiple Cells for a single column is going
to force an API change. Google's API for scanners is somewhat different from
HBase:
{code}
Scanner scanner(T);
ScanStream* stream;
stream = scanner.FetchColumnFamily("anchor");
stream->SetReturnAllVersions();
scanner.Lookup("com.cnn.www");
for (; !stream->Done(); stream->Next()) {
printf("%s %s %lld %s\n",
scanner.RowName(),
stream->ColumnName(),
stream->MicroTimestamp(),
stream->Value());
}
{code}
In HBase, we currently cannot retrieve values for multiple timestamps for the
same column:
{code}
HTable t = new HTable(conf, "tableName");
Scanner s = t.getScanner(columns, startRow, timestamp, filter);
try {
RowResult r = null;
while ((r = s.next()) != null) {
System.out.print(Bytes.toString(r.getRow));
for (Map.Entry<byte[], Cell> column: r.entrySet()) {
System.out.print(" " + Bytes.toString(column.getKey()));
Cell c column.getValue();
System.out.println(" " + c.getTimestamp() + " " +
Bytes.toString(c.getValue()));
}
}
} finally {
s.close();
}
{code}
The problem is, how do we return multiple Cells per column, without seriously
breaking the client API? Proposed solution:
- make Cell implement Iterable
- getValue() returns the "current" value
- getTimestamp returns the "current" timestamp
- hasNext() returns true if there are more values
- next() advances to the "next" value/timestamp
Initially the "current" value/timestamp point to the first timestamp/value,
which preserves the current API.
Comments?
> Add a HTable get/obstainScanner method that retrieves all versions of a
> particular column and row between two timestamps
> ------------------------------------------------------------------------------------------------------------------------
>
> Key: HBASE-33
> URL: https://issues.apache.org/jira/browse/HBASE-33
> Project: Hadoop HBase
> Issue Type: New Feature
> Components: client
> Reporter: Peter Dolan
> Assignee: Jim Kellerman
> Priority: Blocker
> Fix For: 0.2.0
>
>
> The use case:
> * A weblog application for which rows are user ids and posts are stored in a
> single column, with post date specified by the cell's timestamp. The
> application would then need to be able to display all posts for the last week
> or month.
> * A feedfetcher for which rows are URLs and feed posts are stored in a single
> column with the post publish date or fetch time stored in the cell's
> timestamp. The application would then need to be able to display all posts
> for the last week or month.
> Proposed API:
> // Get all versions of the specified row and column whose timestamps are in
> [minTimestamp, maxTimestamp]
> SortedMap<long, byte[]> getTimestamps(Text row, Text column, long
> minTimestamp, long maxTimestamp);
> // Get all versions of the specified row and column whose timestamps are >=
> minTimestamp
> SortedMap<long, byte[]> getTimestamps(Text row, Text column, long
> minTimestamp);
> I'd be happy to take this on myself, as I need it for the above use cases
> before migrating my application over to HBase.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.