Samarth Jain created PHOENIX-2189:
-------------------------------------
Summary: Starting from HBase 1.x, phoenix shouldn't probably
override the hbase.client.scanner.caching attribute
Key: PHOENIX-2189
URL: https://issues.apache.org/jira/browse/PHOENIX-2189
Project: Phoenix
Issue Type: Bug
Reporter: Samarth Jain
After PHOENIX-2188 is fixed, we need to think about whether it makes sense to
override the scanner cache size in Phoenix for branches HBase 1.x. For ex - in
HBase 1.1, the default value of hbase.client.scanner.caching is now
Integer.MAX_VALUE.
{code:xml}
<property>
<name>hbase.client.scanner.caching</name>
<value>2147483647</value>
<description>Number of rows that we try to fetch when calling next
on a scanner if it is not served from (local, client) memory. This
configuration
works together with hbase.client.scanner.max.result.size to try and use the
network efficiently. The default value is Integer.MAX_VALUE by default so
that
the network will fill the chunk size defined by
hbase.client.scanner.max.result.size
rather than be limited by a particular number of rows since the size of
rows varies
table to table. If you know ahead of time that you will not require more
than a certain
number of rows from a scan, this configuration should be set to that row
limit via
Scan#setCaching. Higher caching values will enable faster scanners but will
eat up more
memory and some calls of next may take longer and longer times when the
cache is empty.
Do not set this value such that the time between invocations is greater
than the scanner
timeout; i.e. hbase.client.scanner.timeout.period</description>
</property>
{code:xml}
>From the comments it sounds like, by default, HBase is going to provide an
>upper bound on the scanner cache size in bytes and not number of records.
If we end up overriding the hbase.client.scanner.caching to 1000, then
potentially for narrower rows we will likely be fetching too few rows. For
wider rows, likely the bytes limit will kick in to make sure we don't end up
caching too much on the client.
Maybe we shouldn't be using the scanner caching override at all? Thoughts?
[~jamestaylor], [~lhofhansl]
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)