[jira] [Commented] (HBASE-9343) Implement stateless scanner for Stargate

Nick Dimiduk (JIRA) Fri, 13 Sep 2013 16:41:45 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-9343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13767178#comment-13767178
 ]


Nick Dimiduk commented on HBASE-9343:
-------------------------------------

This is more of a design review/comment than specific to you patch. Let me know 
what you think.

I'm a fan of rolling out a streaming API for accessing CellSets. However, I 
think patch v2 is adding to existing confusion. From a data model perspective, 
hbase starts at the top with tables (well, namespaces now, but ignore that), 
followed by rows. For someone exploring the API, getting a listing of top-level 
entities makes sense (it would be nice to also get basic cluster info here, but 
that's a separate issue):

{noformat}
$ curl ... http://localhost:8080/ ; echo
{"table":[{"name":"foo"}]}
{noformat}

The next logical step would be to get information about the table (ie, the 
schema) using {{GET /<table>}}. Instead, they have to use {{GET 
/<table>/schema}}. Let's set this aside a minute, let me come back to it.

After that, {{GET /<table>/<rowkey>}} works as expected (though you're excluded 
from requesting the 'schema' rowkey).

{noformat}
$ curl ... http://localhost:8080/foo/r1 ; echo
{"Row":[{"key":"cjE=","Cell":[{"column":"ZjE6","timestamp":1379113061705,"$":"ZW1wdHkh"},{"column":"ZjE6YmFy","timestamp":1379113067612,"$":"YmF6"}]}]}
{noformat}

According to the HBase data model, I think this makes good sense.

You can also perform simple prefix-filtered scans using the magical "\*" (glob) 
character (again, excluding people from requesting the '\*' rowkey).

{noformat}
$ curl ... http://localhost:8080/foo/r* ; echo
{"Row":[{"key":"cjE=","Cell":[{"column":"ZjE6","timestamp":1379113061705,"$":"ZW1wdHkh"},{"column":"ZjE6YmFy","timestamp":1379113067612,"$":"YmF6"}]}]}
{noformat}

Nicely self-consistent, {{GET /<table>/*}} returns a full table scan.

{noformat}
$ curl ... http://localhost:8080/foo/* ; echo
{"Row":[{"key":"cjE=","Cell":[{"column":"ZjE6","timestamp":1379113061705,"$":"ZW1wdHkh"},{"column":"ZjE6YmFy","timestamp":1379113067612,"$":"YmF6"}]},{"key":"c2NoZW1h","Cell":[{"column":"ZjE6Zm9v","timestamp":1379114118517,"$":"ZG9lcyB0aGlzIHdvcms/"}]}]}{noformat}

This patch introduces {{GET /<table>}} not as table resource info, but as a way 
to list rows. Per my earlier comment, I think this should be reserved for table 
info.

Does it make sense to instead roll this new streaming scanner stuff into the 
{{GET /<table>/\*}} functionality? '\*' is special anyway, so why not extend it 
with these scanner creation query parameters? That way, we can move to an API 
that behaves like:

{noformat}
GET / => table list (and maybe cluster info?)
GET /<table> => table info (existing /<table>/schema)
GET /<table>/<rowkey> => existing behavior (+ your new streaming hotness?)
GET /<table>/<optional_prefix>* => existing behavior (+ your new streaming 
hotness!)
GET /<table>/<optional_prefix>*?<filter_args...> => all your new streaming 
hotness plus implied rowkey prefix filter.
{noformat}

I think this starts to look like a more idiomatic rest API. What do you guys 
think?

(We should also figure out and document how a user retrieves their precious 
data hidden behind the rowkeys '\*', 'schema', &c.)
                
> Implement stateless scanner for Stargate
> ----------------------------------------
>
>                 Key: HBASE-9343
>                 URL: https://issues.apache.org/jira/browse/HBASE-9343
>             Project: HBase
>          Issue Type: Improvement
>          Components: REST
>    Affects Versions: 0.94.11
>            Reporter: Vandana Ayyalasomayajula
>            Assignee: Vandana Ayyalasomayajula
>            Priority: Minor
>             Fix For: 0.98.0, 0.96.0
>
>         Attachments: HBASE-9343_94.00.patch, HBASE-9343_94.01.patch, 
> HBASE-9343_trunk.00.patch, HBASE-9343_trunk.01.patch, 
> HBASE-9343_trunk.01.patch, HBASE-9343_trunk.02.patch
>
>
> The current scanner implementation for scanner stores state and hence not 
> very suitable for REST server failure scenarios. The current JIRA proposes to 
> implement a stateless scanner. In the first version of the patch, a new 
> resource class "ScanResource" has been added and all the scan parameters will 
> be specified as query params. 
> The following are the scan parameters
> startrow -  The start row for the scan.
> endrow - The end row for the scan.
> columns - The columns to scan. 
> starttime, endtime - To only retrieve columns within a specific range of 
> version timestamps,both start and end time must be specified.
> maxversions  - To limit the number of versions of each column to be returned.
> batchsize - To limit the maximum number of values returned for each call to 
> next().
> limit - The number of rows to return in the scan operation.
>  More on start row, end row and limit parameters.
> 1. If start row, end row and limit not specified, then the whole table will 
> be scanned.
> 2. If start row and limit (say N) is specified, then the scan operation will 
> return N rows from the start row specified.
> 3. If only limit parameter is specified, then the scan operation will return 
> N rows from the start of the table.
> 4. If limit and end row are specified, then the scan operation will return N 
> rows from start of table till the end row. If the end row is 
> reached before N rows ( say M and M &lt; N ), then M rows will be returned to 
> the user.
> 5. If start row, end row and limit (say N ) are specified and N &lt; number 
> of rows between start row and end row, then N rows from start row
> will be returned to the user. If N &gt; (number of rows between start row and 
> end row (say M), then M number of rows will be returned to the
> user.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-9343) Implement stateless scanner for Stargate

Reply via email to