[jira] [Commented] (HBASE-13099) Scans as in DynamoDB

Enis Soztutar (JIRA) Wed, 25 Feb 2015 11:19:46 -0800

    [ 
https://issues.apache.org/jira/browse/HBASE-13099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14337017#comment-14337017
 ]


Enis Soztutar commented on HBASE-13099:
---------------------------------------

I think we may have to keep at least some state in the server, even if we do a 
cell-based scanner. Our contract is per-row atomicity, so we have to keep track 
of: 
1. read point while scanning inside a row. 
2. low watermark for the read points across all "open" scanners for the region. 

(1) can even be extended to be a region based contract if we consider atomic 
updates cross-row using the MultiRowMutationEndpoint. (2) is needed for 
effectively getting rid of seqId's of cells in hfiles. 

We keep (1) in the server side right now, and we use the row-based scanner 
contract for (1). The client either gets the whole row, or not. The scanner can 
be restarted across rows, which changes the scanner read point, but it is fine 
since there is no guarantees across rows for visibility (excluding single 
region multi-row transactions). 

>From a semantics point of view, (1) can be achieved with sending the read 
>point to the client everytime a scan is started within a region. The client 
>will keep track of 1 read point per region. Any subsequent scans performed 
>from the client in the region will also send this read point to the server so 
>that the scan does not see partial data. (2) can be solved by either not 
>deleting seqId's of cells in hfiles (which we do to optimize disk usage), or 
>keeping track of all open scanners' read points which requires still some 
>state (even though very small) in the server. 

> Scans as in DynamoDB
> --------------------
>
>                 Key: HBASE-13099
>                 URL: https://issues.apache.org/jira/browse/HBASE-13099
>             Project: HBase
>          Issue Type: Brainstorming
>          Components: Client, regionserver
>            Reporter: Nicolas Liochon
>
> cc: [[email protected]] - as discussed offline.
> DynamoDB has a very simple way to manage scans server side:
> ??citation??
> The data returned from a Query or Scan operation is limited to 1 MB; this 
> means that if you scan a table that has more than 1 MB of data, you'll need 
> to perform another Scan operation to continue to the next 1 MB of data in the 
> table.
> If you query or scan for specific attributes that match values that amount to 
> more than 1 MB of data, you'll need to perform another Query or Scan request 
> for the next 1 MB of data. To do this, take the LastEvaluatedKey value from 
> the previous request, and use that value as the ExclusiveStartKey in the next 
> request. This will let you progressively query or scan for new data in 1 MB 
> increments.
> When the entire result set from a Query or Scan has been processed, the 
> LastEvaluatedKey is null. This indicates that the result set is complete 
> (i.e. the operation processed the “last page” of data).
> ??citation??
> This means that there is no state server side: the work is done client side.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-13099) Scans as in DynamoDB

Reply via email to