[ 
https://issues.apache.org/jira/browse/HBASE-13099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14336928#comment-14336928
 ] 

Jonathan Lawlor commented on HBASE-13099:
-----------------------------------------

Interesting idea. This seems like it would make the client-server interaction 
during Scans much cleaner. Instead of assuming that the server understands the 
state that the Client thinks it is in, it would be much more explicit, along 
the lines of "I am in this state, give me these Results".

We would probably want the LastEvaluatedKey to be an extra parameter in the RPC 
response, rather than assumed to be the last KV in the Result. I think this 
would be preferable because it is possible that keys further down in the table 
were evaluated but filtered out. If we assume it to be the last KV in the 
Result we may find that we are constantly rescanning KV's that were previously 
excluded, only to find out that they will still be excluded.

Moving the state from the server to the client would require adding more 
parameters into the RPC response. As mentioned above, LastEvaluatedKey would 
likely be one of the parameters. Another parameter would likely be the MVCC 
read point that is currently maintained within the RegionScanner.

While this would make the interactions cleaner, I wonder how this would affect 
the performance of Scans. How I am currently imagining this (correct me if I'm 
wrong), it seems like we would incur an extra overhead on each scan due to the 
extra initialization required server side. On each scan RPC we would need to 
create a new RegionScanner, setup the key value heaps, seek to the correct row, 
and then potentially filter out the key values that we have already evaluated. 
This overhead is currently avoided by sending along the open scanner id from 
the client to the server so that the already setup scanner just continues where 
it left off.

If the move to client-side-state could be done without incurring any 
performance loss, I think this would be a great improvement that would make 
scans easier to understand.

> Scans as in DynamoDB
> --------------------
>
>                 Key: HBASE-13099
>                 URL: https://issues.apache.org/jira/browse/HBASE-13099
>             Project: HBase
>          Issue Type: Brainstorming
>          Components: Client, regionserver
>            Reporter: Nicolas Liochon
>
> cc: [[email protected]] - as discussed offline.
> DynamoDB has a very simple way to manage scans server side:
> ??citation??
> The data returned from a Query or Scan operation is limited to 1 MB; this 
> means that if you scan a table that has more than 1 MB of data, you'll need 
> to perform another Scan operation to continue to the next 1 MB of data in the 
> table.
> If you query or scan for specific attributes that match values that amount to 
> more than 1 MB of data, you'll need to perform another Query or Scan request 
> for the next 1 MB of data. To do this, take the LastEvaluatedKey value from 
> the previous request, and use that value as the ExclusiveStartKey in the next 
> request. This will let you progressively query or scan for new data in 1 MB 
> increments.
> When the entire result set from a Query or Scan has been processed, the 
> LastEvaluatedKey is null. This indicates that the result set is complete 
> (i.e. the operation processed the “last page” of data).
> ??citation??
> This means that there is no state server side: the work is done client side.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to