[ 
https://issues.apache.org/jira/browse/HBASE-13442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14505836#comment-14505836
 ] 

Jonathan Lawlor commented on HBASE-13442:
-----------------------------------------

bq. I don't see why an app should care specifically about how many rows the 
client transfers from the server in each RPC - bytes seem the more relevant 
currency to tune for performance.

Really good point, I can't think of such a scenario either. Certainly we want 
to return results from the server on the basis of size rather than some 
arbitrary number of rows (since row size can vary table to table, there isn't a 
universally "good" row limit). This is supported by the move to the default 
configurations of (caching = Integer.MAX_VALUE, maxResultSize = 2 MB). So 
actually, the best course of action here wouldn't be to rename caching... but 
actually to deprecate it so eventually it can be removed completely in favor of 
rowLimit.

The feature in the protocol that allows the client to ask for a certain number 
of rows would remain, but only be used for backwards compatibility and for the 
scenario that the client wants to limit itself to only a certain number of 
rows. Makes sense to me.

With such a change, we would also want to remove any associated configurations 
for caching/rowlimit  in hbase-site.xml and hbase-default.xml. There isn't a 
scenario (at least that I can think of) where it would be appropriate to limit 
all scans to a particular number of rows and then close them. The row limit 
would be like the startRow or stopRow settings on scans, configured on a per 
scan basis with no means to set a global default for all scans.

> Rename scanner caching to a more semantically correct term such as row limit
> ----------------------------------------------------------------------------
>
>                 Key: HBASE-13442
>                 URL: https://issues.apache.org/jira/browse/HBASE-13442
>             Project: HBase
>          Issue Type: Sub-task
>            Reporter: Jonathan Lawlor
>         Attachments: HBASE-13442-proposal.diff
>
>
> Caching acts more as a row limit now. By default in branch-1+, a Scan is 
> configured with (caching=Integer.MAX_VALUE, maxResultSize=2MB) so that we 
> service scans on the basis of buffer size rather than number of rows. As a 
> result, caching should now only be configured in instances where the user 
> knows that they will only need X rows. Thus, caching should be renamed to 
> something that is more semantically correct such as rowLimit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to