[jira] Commented: (HBASE-33) Add a HTable get/obstainScanner method that retrieves all versions of a particular column and row between two timestamps

Jim Kellerman (JIRA) Mon, 30 Jun 2008 11:16:06 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-33?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12609318#action_12609318
 ]


Jim Kellerman commented on HBASE-33:
------------------------------------

Implementing a scanner that returns multiple Cells for a single column is going 
to force an API change. Google's API for scanners is somewhat different from 
HBase:

{code}
Scanner scanner(T);
ScanStream* stream;
stream = scanner.FetchColumnFamily("anchor");
stream->SetReturnAllVersions();
scanner.Lookup("com.cnn.www");
for (; !stream->Done(); stream->Next()) {
  printf("%s %s %lld %s\n",
    scanner.RowName(),
    stream->ColumnName(),
    stream->MicroTimestamp(),
    stream->Value());
}
{code}

In HBase, we currently cannot retrieve values for multiple timestamps for the 
same column:

{code}
HTable t = new HTable(conf, "tableName");
Scanner s = t.getScanner(columns, startRow, timestamp, filter);
try {
  RowResult r = null;
  while ((r = s.next()) != null) {
    System.out.print(Bytes.toString(r.getRow));
    for (Map.Entry<byte[], Cell> column: r.entrySet()) {
      System.out.print(" " + Bytes.toString(column.getKey()));
      Cell c column.getValue();
      System.out.println(" " + c.getTimestamp() + " " + 
Bytes.toString(c.getValue()));
    }
  }
} finally {
  s.close();
}
{code}

The problem is, how do we return multiple Cells per column, without seriously 
breaking the client API? Proposed solution:
- make Cell implement Iterable
- getValue() returns the "current" value
- getTimestamp returns the "current" timestamp
- hasNext() returns true if there are more values
- next() advances to the "next" value/timestamp

Initially the "current" value/timestamp point to the first timestamp/value, 
which preserves the current API.

Comments?


> Add a HTable get/obstainScanner method that retrieves all versions of a 
> particular column and row between two timestamps
> ------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-33
>                 URL: https://issues.apache.org/jira/browse/HBASE-33
>             Project: Hadoop HBase
>          Issue Type: New Feature
>          Components: client
>            Reporter: Peter Dolan
>            Assignee: Jim Kellerman
>            Priority: Blocker
>             Fix For: 0.2.0
>
>
> The use case:
> * A weblog application for which rows are user ids and posts are stored in a 
> single column, with post date specified by the cell's timestamp.  The 
> application would then need to be able to display all posts for the last week 
> or month.
> * A feedfetcher for which rows are URLs and feed posts are stored in a single 
> column with the post publish date or fetch time stored in the cell's 
> timestamp.  The application would then need to be able to display all posts 
> for the last week or month.
> Proposed API:
> // Get all versions of the specified row and column whose timestamps are in 
> [minTimestamp, maxTimestamp]
> SortedMap<long, byte[]> getTimestamps(Text row, Text column, long 
> minTimestamp, long maxTimestamp);
> // Get all versions of the specified row and column whose timestamps are >= 
> minTimestamp
> SortedMap<long, byte[]> getTimestamps(Text row, Text column, long 
> minTimestamp);
> I'd be happy to take this on myself, as I need it for the above use cases 
> before migrating my application over to HBase.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-33) Add a HTable get/obstainScanner method that retrieves all versions of a particular column and row between two timestamps

Reply via email to