[ 
https://issues.apache.org/jira/browse/HBASE-15576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16022376#comment-16022376
 ] 

Duo Zhang commented on HBASE-15576:
-----------------------------------

In the current patch, we use Result.isCursor to determine if this is a cursor, 
and then use Result.getRow to get the rowkey which means that the cursor can 
only be a rowkey? I would suggest that we return a data structure, or at least, 
a byte array with no meanings to user and introduce Scan.createFromCursor 
method to create a Scan object. This is better for us to change the content of 
the cursor in the future to include mvcc or make the cursor start at the middle 
of a row.

 And maybe we could wrapper the cursor with an exception when using 
ResultScanner to keep the API of Result clean? And for async client, we could 
change the RawScanResultConsumer to pass the cursor when calling onHeartbeat?

Thanks.

> Scanning cursor to prevent blocking long time on ResultScanner.next()
> ---------------------------------------------------------------------
>
>                 Key: HBASE-15576
>                 URL: https://issues.apache.org/jira/browse/HBASE-15576
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Phil Yang
>            Assignee: Phil Yang
>             Fix For: 2.0.0, 1.4.0
>
>         Attachments: HBASE-15576.v01.patch
>
>
> After 1.1.0 released, we have partial and heartbeat protocol in scanning to 
> prevent responding large data or timeout. Now for ResultScanner.next(), we 
> may block for longer time larger than timeout settings to get a Result if the 
> row is very large, or filter is sparse, or there are too many delete markers 
> in files.
> However, in some scenes, we don't want it to be blocked for too long. For 
> example, a web service which handles requests from mobile devices whose 
> network is not stable and we can not set timeout too long(eg. only 5 seconds) 
> between mobile and web service. This service will scan rows from HBase and 
> return it to mobile devices. In this scene, the simplest way is to make the 
> web service stateless. Apps in mobile devices will send several requests one 
> by one to get the data until enough just like paging a list. In each request 
> it will carry a start position which depends on the last result from web 
> service. Different requests can be sent to different web service server 
> because it is stateless.
> Therefore, the stateless web service need a cursor from HBase telling where 
> we have scanned in RegionScanner when HBase client receives an empty 
> heartbeat. And the service will return the cursor to mobile device although 
> the response has no data. In next request we can start at the position of 
> cursor, without the cursor we have to scan from last returned result and we 
> may timeout forever. And of course even if the heartbeat message is not empty 
> we can still use cursor to prevent re-scan the same rows/cells which has beed 
> skipped.
> Obviously, we will give up consistency for scanning because even HBase client 
> is also stateless, but it is acceptable in this scene. And maybe we can keep 
> mvcc in cursor so we can get a consistent view?
> HBASE-13099 had some discussion, but it has no further progress by now.
> API:
> In Scan we need a new method setNeedCursorResult(true) to get the cursor row 
> key when there is a RPC response but client can not return any Result. In 
> this mode we will not block ResultScanner.next() longer than this timeout 
> setting.
> {code}
> while (r = scanner.next() && r != null) {
>   if(r.isCursor()){
>   // scanning is not end, it is a cursor, save its row key and close scanner 
> if you want, or
>   // just continue the loop to call next().
>   } else {
>   // just like before
>   }
> }
> // scanning is end
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to