Phil Yang created HBASE-15576:
---------------------------------
Summary: Support stateless scanning and scanning cursor
Key: HBASE-15576
URL: https://issues.apache.org/jira/browse/HBASE-15576
Project: HBase
Issue Type: New Feature
Reporter: Phil Yang
Assignee: Phil Yang
After 1.1.0 released, we have partial and heartbeat protocol in scanning to
prevent responding large data or timeout. Now for ResultScanner.next(), we may
block for longer time larger than timeout settings to get a Result if the row
is very large, or filter is sparse, or there are too many delete markers in
files.
However, in some scenes, we don't want it to be blocked for too long. For
example, a web service which handles requests from mobile devices whose network
is not stable and we can not set timeout too long(eg. only 5 seconds) between
mobile and web service. This service will scan rows from HBase and return it to
mobile devices. In this scene, the simplest way is to make the web service
stateless. Apps in mobile devices will send several requests one by one to get
the data until enough just like paging a list. In each request it will carry a
start position which depends on the last result from web service. Different
requests can be sent to different web service server because it is stateless.
Therefore, the stateless web service need a cursor from HBase telling where we
have scanned in RegionScanner when HBase client receives an empty heartbeat.
And the service will return the cursor to mobile device although the response
has no data. In next request we can start at the position of cursor, without
the cursor we have to scan from last returned result and we may timeout
forever. And of course even if the heartbeat message is not empty we can still
use cursor to prevent re-scan the same rows/cells which has beed skipped.
Obviously, we will give up consistency for scanning because even HBase client
is also stateless, but it is acceptable in this scene. And maybe we can keep
mvcc in cursor so we can get a consistent view?
HBASE-13099 had some discussion, but it has no further progress by now.
API:
In Scan we need a new method setStateless to make the scanning stateless and
need another timeout setting for stateless scanning. In this mode we will not
block ResultScanner.next() longer than this timeout setting. And we will return
Results in next() as usual but the last Result (or only Result if we receive
empty heartbeat) has a special flag to mark it a cursor. The cursor Result has
only one Cell. Users can scan like this:
{code}
while( r = scanner.next() && r != null && !r.isCursor()){
//just like before
}
if(r != null){
// scanning is not end, it is a cursor
} else {
// scanning is end
}
scanner.close()
{code}
Implementation:
We will have two options to support stateless scanning:
Only one rpc like small scanning, not supporting batch/partials and cursor is
row level. It is simple to implementation.
Support big scanning with several rpc requests, supporting batch/partials and
cursor is cell level. It is a little complex because we need seek at server
side.
Or we can make it by two phases, support one-shot first?
Any thoughts? Thanks.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)