[
https://issues.apache.org/jira/browse/HBASE-9343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13767178#comment-13767178
]
Nick Dimiduk commented on HBASE-9343:
-------------------------------------
This is more of a design review/comment than specific to you patch. Let me know
what you think.
I'm a fan of rolling out a streaming API for accessing CellSets. However, I
think patch v2 is adding to existing confusion. From a data model perspective,
hbase starts at the top with tables (well, namespaces now, but ignore that),
followed by rows. For someone exploring the API, getting a listing of top-level
entities makes sense (it would be nice to also get basic cluster info here, but
that's a separate issue):
{noformat}
$ curl ... http://localhost:8080/ ; echo
{"table":[{"name":"foo"}]}
{noformat}
The next logical step would be to get information about the table (ie, the
schema) using {{GET /<table>}}. Instead, they have to use {{GET
/<table>/schema}}. Let's set this aside a minute, let me come back to it.
After that, {{GET /<table>/<rowkey>}} works as expected (though you're excluded
from requesting the 'schema' rowkey).
{noformat}
$ curl ... http://localhost:8080/foo/r1 ; echo
{"Row":[{"key":"cjE=","Cell":[{"column":"ZjE6","timestamp":1379113061705,"$":"ZW1wdHkh"},{"column":"ZjE6YmFy","timestamp":1379113067612,"$":"YmF6"}]}]}
{noformat}
According to the HBase data model, I think this makes good sense.
You can also perform simple prefix-filtered scans using the magical "\*" (glob)
character (again, excluding people from requesting the '\*' rowkey).
{noformat}
$ curl ... http://localhost:8080/foo/r* ; echo
{"Row":[{"key":"cjE=","Cell":[{"column":"ZjE6","timestamp":1379113061705,"$":"ZW1wdHkh"},{"column":"ZjE6YmFy","timestamp":1379113067612,"$":"YmF6"}]}]}
{noformat}
Nicely self-consistent, {{GET /<table>/*}} returns a full table scan.
{noformat}
$ curl ... http://localhost:8080/foo/* ; echo
{"Row":[{"key":"cjE=","Cell":[{"column":"ZjE6","timestamp":1379113061705,"$":"ZW1wdHkh"},{"column":"ZjE6YmFy","timestamp":1379113067612,"$":"YmF6"}]},{"key":"c2NoZW1h","Cell":[{"column":"ZjE6Zm9v","timestamp":1379114118517,"$":"ZG9lcyB0aGlzIHdvcms/"}]}]}{noformat}
This patch introduces {{GET /<table>}} not as table resource info, but as a way
to list rows. Per my earlier comment, I think this should be reserved for table
info.
Does it make sense to instead roll this new streaming scanner stuff into the
{{GET /<table>/\*}} functionality? '\*' is special anyway, so why not extend it
with these scanner creation query parameters? That way, we can move to an API
that behaves like:
{noformat}
GET / => table list (and maybe cluster info?)
GET /<table> => table info (existing /<table>/schema)
GET /<table>/<rowkey> => existing behavior (+ your new streaming hotness?)
GET /<table>/<optional_prefix>* => existing behavior (+ your new streaming
hotness!)
GET /<table>/<optional_prefix>*?<filter_args...> => all your new streaming
hotness plus implied rowkey prefix filter.
{noformat}
I think this starts to look like a more idiomatic rest API. What do you guys
think?
(We should also figure out and document how a user retrieves their precious
data hidden behind the rowkeys '\*', 'schema', &c.)
> Implement stateless scanner for Stargate
> ----------------------------------------
>
> Key: HBASE-9343
> URL: https://issues.apache.org/jira/browse/HBASE-9343
> Project: HBase
> Issue Type: Improvement
> Components: REST
> Affects Versions: 0.94.11
> Reporter: Vandana Ayyalasomayajula
> Assignee: Vandana Ayyalasomayajula
> Priority: Minor
> Fix For: 0.98.0, 0.96.0
>
> Attachments: HBASE-9343_94.00.patch, HBASE-9343_94.01.patch,
> HBASE-9343_trunk.00.patch, HBASE-9343_trunk.01.patch,
> HBASE-9343_trunk.01.patch, HBASE-9343_trunk.02.patch
>
>
> The current scanner implementation for scanner stores state and hence not
> very suitable for REST server failure scenarios. The current JIRA proposes to
> implement a stateless scanner. In the first version of the patch, a new
> resource class "ScanResource" has been added and all the scan parameters will
> be specified as query params.
> The following are the scan parameters
> startrow - The start row for the scan.
> endrow - The end row for the scan.
> columns - The columns to scan.
> starttime, endtime - To only retrieve columns within a specific range of
> version timestamps,both start and end time must be specified.
> maxversions - To limit the number of versions of each column to be returned.
> batchsize - To limit the maximum number of values returned for each call to
> next().
> limit - The number of rows to return in the scan operation.
> More on start row, end row and limit parameters.
> 1. If start row, end row and limit not specified, then the whole table will
> be scanned.
> 2. If start row and limit (say N) is specified, then the scan operation will
> return N rows from the start row specified.
> 3. If only limit parameter is specified, then the scan operation will return
> N rows from the start of the table.
> 4. If limit and end row are specified, then the scan operation will return N
> rows from start of table till the end row. If the end row is
> reached before N rows ( say M and M < N ), then M rows will be returned to
> the user.
> 5. If start row, end row and limit (say N ) are specified and N < number
> of rows between start row and end row, then N rows from start row
> will be returned to the user. If N > (number of rows between start row and
> end row (say M), then M number of rows will be returned to the
> user.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira