[ https://issues.apache.org/jira/browse/HBASE-1996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12794863#action_12794863 ]
Dave Latham commented on HBASE-1996: ------------------------------------ Perhaps if we supported both settings i.e. buffer at least X rows and at least Y bytes. Too much complexity? We could still default to min 1 row and 0 bytes to minimize the chance of a scanner timeout if needed. > Configure scanner buffer in bytes instead of number of rows > ----------------------------------------------------------- > > Key: HBASE-1996 > URL: https://issues.apache.org/jira/browse/HBASE-1996 > Project: Hadoop HBase > Issue Type: Improvement > Reporter: Dave Latham > Assignee: Dave Latham > Fix For: 0.21.0 > > Attachments: 1966.patch > > > Currently, the default scanner fetches a single row at a time. This makes > for very slow scans on tables where the rows are not large. You can change > the setting for an HTable instance or for each Scan. > It would be better to have a default that performs reasonably well so that > people stop running into slow scans because they are evaluating HBase, aren't > familiar with the setting, or simply forgot. Unfortunately, if we increase > the value of the current setting, then we run the risk of running OOM for > tables with large rows. Let's change the setting so that it works with a > size in bytes, rather than in rows. This will allow us to set a reasonable > default so that tables with small rows will scan performantly and tables with > large rows will not run OOM. > Note that the case is very similar to table writes as well. When disabling > auto flush, we buffer a list of Put's to commit at once. That buffer is > measured in bytes, so that a small number of large Puts or a lot of small > Puts can each fit in a single flush. If that buffer were measured in number > of Put's it would have the same problem that we have for the scan buffer, and > we wouldn't be able to set a good default value for tables with different > size rows. Changing the scan buffer to be configured like the write buffer > will make it more consistent. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.