[ 
https://issues.apache.org/jira/browse/HBASE-1996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erik Rozendaal updated HBASE-1996:
----------------------------------

    Attachment: 1996-0.20.3-v2.patch

Second version of patch for 0.20.3 branch. This makes the maximum result size 
configurable.

However: the client and the server *must* use the same maximum result size, 
otherwise rows in regions may be skipped. This is because of the way the 
results of a region scan are reported to the client:

- null: scanning filter stopped processing
- fewer rows returned than requested: end-of-region reached, move on.

The second point is why the HTable modifications are necessary. It is now 
normal that a region scan will return fewer rows than requested even when the 
end of the region has not been reached yet. So the client needs to duplicate 
the region server logic to keep in sync.

I think for 0.21 the result communication to the client should be made more 
explicit, eg. make a ScannerCallableResult class that contains a status field 
(MORE_AVAILABLE, SKIP_TO_NEXT_REGION, FILTER_SAID_STOP) as well as the actual 
rows returned.

I also left the default max result size value at 1 megabyte. In my (admittedly 
limited) testing using just my laptop without a real network a size of 256-1024 
kB seems to be optimal.

Here are my test results:

||max scanner result size (bytes)||MB/s scanned with rows avg 750 bytes||MB/s 
scanned with rows avg 175 bytes||
|1024|3.23|1.99|
|2048|5.14|3.10|
|4096|7.34|4.67|
|8192|10.95|6.50|
|16384|16.15|8.30|
|32768|18.96|8.50|
|65536|20.42|9.16|
|131072|20.93|9.06|
|262144|21.48|9.49|
|524288|22.34|9.37|
|1048576|22.50|8.91|
|2097152|20.91|8.03|
|4194304|19.86|7.35|
|8388608|17.89|6.83|
|16777216|17.63|6.98|

Scanner caching was set to Integer.MAX_VALUE (unlimited number of rows). MB/s 
are measured going through a web server, so raw HBase speed is probably double 
or higher. Obviously a real cluster test should be done to measure real 
performance and otherwise tune the max result size.


> Configure scanner buffer in bytes instead of number of rows
> -----------------------------------------------------------
>
>                 Key: HBASE-1996
>                 URL: https://issues.apache.org/jira/browse/HBASE-1996
>             Project: Hadoop HBase
>          Issue Type: Improvement
>            Reporter: Dave Latham
>            Assignee: Dave Latham
>             Fix For: 0.21.0
>
>         Attachments: 1966.patch, 1996-0.20.3-v2.patch, 1996-0.20.3.patch
>
>
> Currently, the default scanner fetches a single row at a time.  This makes 
> for very slow scans on tables where the rows are not large.  You can change 
> the setting for an HTable instance or for each Scan.
> It would be better to have a default that performs reasonably well so that 
> people stop running into slow scans because they are evaluating HBase, aren't 
> familiar with the setting, or simply forgot.  Unfortunately, if we increase 
> the value of the current setting, then we run the risk of running OOM for 
> tables with large rows.  Let's change the setting so that it works with a 
> size in bytes, rather than in rows.  This will allow us to set a reasonable 
> default so that tables with small rows will scan performantly and tables with 
> large rows will not run OOM.
> Note that the case is very similar to table writes as well.  When disabling 
> auto flush, we buffer a list of Put's to commit at once.  That buffer is 
> measured in bytes, so that a small number of large Puts or a lot of small 
> Puts can each fit in a single flush.  If that buffer were measured in number 
> of Put's it would have the same problem that we have for the scan buffer, and 
> we wouldn't be able to set a good default value for tables with different 
> size rows.  Changing the scan buffer to be configured like the write buffer 
> will make it more consistent.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to