[ 
https://issues.apache.org/jira/browse/HBASE-13215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14363974#comment-14363974
 ] 

Jonathan Lawlor commented on HBASE-13215:
-----------------------------------------

[~heliangliang] I see, that makes sense to me. Certainly the approach outlined 
in HBASE-13090 wouldn't be able to provide as fine grained control as a raw key 
value limit. 

I think that we would probably want to make some comment in the docs of this 
feature about how this limit should only be specified in specific circumstances 
(such as the use cases you have described above). This seems like a feature 
that would be nice to have to provide strict control over RPCs, but may cause 
performance degradation if used without full knowledge of the drawbacks of 
specifying such a limit. By default we would probably want this limit to be 
Long.Max_Value or Int.Max_Value so that the current behavior is followed.

In terms of saving the scanner position to re-open later, is the position that 
is saved the row key? Does this handle the case where the raw key value limit 
is reached in the middle of a row? Or is the raw key value limit instead 
enforced only in between rows (i.e. after all the cells for a particular row 
have been retrieved then you check the limit and only continue if not reached)? 

Looking forward to this :)

> A limit on the raw key values is needed for each next call of a scanner
> -----------------------------------------------------------------------
>
>                 Key: HBASE-13215
>                 URL: https://issues.apache.org/jira/browse/HBASE-13215
>             Project: HBase
>          Issue Type: Improvement
>          Components: Scanners
>            Reporter: He Liangliang
>            Assignee: He Liangliang
>
> In the current scanner next, there are several limits: caching, batch and 
> size. But there is no limit on raw data scanned, so the time consumed by a 
> next call is unbounded. For example, many consecutive deleted or filtered out 
> cells will leads to a socket timeout. This can make user code get stuck.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to