[
https://issues.apache.org/jira/browse/HBASE-11544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14066968#comment-14066968
]
Lars Hofhansl commented on HBASE-11544:
---------------------------------------
# Or better, server sends reasonable chunks of data, clients requests (or
awaits) more data as needed for the API (for example to present a complete
row). Anything from 1k to 128k should be good as chunk size. 64k seems fine.
# Even better: Server streams data to client, client passes data up to caller
when it makes sense (i.e. enough data was received to fill a row) and continues
to stream if there's more data. That would keep the network pipe full even for
a single client.
This whole affair is about latency vs bandwidth. It has nothing to do with
number of rows (that is bad a proxy to fix the issue), but only with how many
bytes get sent per RPC request. Too few: bad performance as RTT starts to
dominate. Too much: OOM as to much data has to be materialized per RPC request.
#1 is a good start. Eventually we need to get to #2.
> [Ergonomics] hbase.client.scanner.caching is dogged and will try to return
> batch even if it means OOME
> ------------------------------------------------------------------------------------------------------
>
> Key: HBASE-11544
> URL: https://issues.apache.org/jira/browse/HBASE-11544
> Project: HBase
> Issue Type: Bug
> Reporter: stack
> Labels: noob
>
> Running some tests, I set hbase.client.scanner.caching=1000. Dataset has
> large cells. I kept OOME'ing.
> Serverside, we should measure how much we've accumulated and return to the
> client whatever we've gathered once we pass out a certain size threshold
> rather than keep accumulating till we OOME.
--
This message was sent by Atlassian JIRA
(v6.2#6252)