[ 
https://issues.apache.org/jira/browse/HBASE-1249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12680649#action_12680649
 ] 

Jonathan Gray commented on HBASE-1249:
--------------------------------------

In the reworking of basically everything, I'd like to propose we change 
server-side methods to allow optimizations wherever possible and client APIs to 
more closely reflect implementation.

A _very_ rough draft to show what i'm talking about:

getColumnsLatest(byte [] row, byte [][] columns)  - only takes columns, no 
families
getFamiliesLatest(byte [] row, byte [][] families)  - only takes families

getColumnsVersions(byte [] row, byte [][] columns, int numVersions)

getColumnsVersionsAfter(byte [] row, byte [][] columns, long afterStamp)
getColumnsVersionsBefore(byte [] row, byte [][] columns, long beforeStamp)

getLatest(byte [] row) implementation is the same as getFamiliesLatest() with 
all families specified.


It's easy to see now how splitting families and columns into two fields will 
not at all work with the current API.  Need a more hierarchical client api, 
client utilities, something more like BatchUpdate even for reads, ...

Also, when dealing with versions (or latest), we will not be able to do most of 
the optimizations if the client can manually specify the timestamp as described 
above.

A few reasons to do this.  For one, it is more clear to users how things are 
being implemented.  But more importantly, it makes sure we're writing a 
server-side method for all the different cases for which we can make 
optimizations.  Right now getting explicitly listed columns shares code with 
getting all columns for explicitly listed families.  These two things each 
contain their own unique possibilities for optimization.  There are also 
different optimizations to be made for deletes and more well-defined read types 
will make the cell cache easier.

> Rearchitecting of server, client, API, key format, etc for 0.20
> ---------------------------------------------------------------
>
>                 Key: HBASE-1249
>                 URL: https://issues.apache.org/jira/browse/HBASE-1249
>             Project: Hadoop HBase
>          Issue Type: Improvement
>            Reporter: Jonathan Gray
>            Priority: Blocker
>             Fix For: 0.20.0
>
>
> To discuss all the new and potential issues coming out of the change in key 
> format (HBASE-1234): zero-copy reads, client binary protocol, update of API 
> (HBASE-880), server optimizations, etc...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to