Configure scanner buffer in bytes instead of number of rows
-----------------------------------------------------------

                 Key: HBASE-1996
                 URL: https://issues.apache.org/jira/browse/HBASE-1996
             Project: Hadoop HBase
          Issue Type: Improvement
            Reporter: Dave Latham
            Assignee: Dave Latham
             Fix For: 0.21.0


Currently, the default scanner fetches a single row at a time.  This makes for 
very slow scans on tables where the rows are not large.  You can change the 
setting for an HTable instance or for each Scan.

It would be better to have a default that performs reasonably well so that 
people stop running into slow scans because they are evaluating HBase, aren't 
familiar with the setting, or simply forgot.  Unfortunately, if we increase the 
value of the current setting, then we run the risk of running OOM for tables 
with large rows.  Let's change the setting so that it works with a size in 
bytes, rather than in rows.  This will allow us to set a reasonable default so 
that tables with small rows will scan performantly and tables with large rows 
will not run OOM.

Note that the case is very similar to table writes as well.  When disabling 
auto flush, we buffer a list of Put's to commit at once.  That buffer is 
measured in bytes, so that a small number of large Puts or a lot of small Puts 
can each fit in a single flush.  If that buffer were measured in number of 
Put's it would have the same problem that we have for the scan buffer, and we 
wouldn't be able to set a good default value for tables with different size 
rows.  Changing the scan buffer to be configured like the write buffer will 
make it more consistent.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to