[ https://issues.apache.org/jira/browse/HBASE-1996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12795579#action_12795579 ]
Erik Rozendaal commented on HBASE-1996: --------------------------------------- @Andrew: HBASE-1537 will work pretty well when KeyValues are of similar/predictable size. However, I prefer to be able to set a limit in bytes. This should use give more predictable performance, especially when you have widely varying row/KeyValue sizes. > Configure scanner buffer in bytes instead of number of rows > ----------------------------------------------------------- > > Key: HBASE-1996 > URL: https://issues.apache.org/jira/browse/HBASE-1996 > Project: Hadoop HBase > Issue Type: Improvement > Reporter: Dave Latham > Assignee: Dave Latham > Fix For: 0.21.0 > > Attachments: 1966.patch, 1996-0.20.3-v2.patch, 1996-0.20.3.patch > > > Currently, the default scanner fetches a single row at a time. This makes > for very slow scans on tables where the rows are not large. You can change > the setting for an HTable instance or for each Scan. > It would be better to have a default that performs reasonably well so that > people stop running into slow scans because they are evaluating HBase, aren't > familiar with the setting, or simply forgot. Unfortunately, if we increase > the value of the current setting, then we run the risk of running OOM for > tables with large rows. Let's change the setting so that it works with a > size in bytes, rather than in rows. This will allow us to set a reasonable > default so that tables with small rows will scan performantly and tables with > large rows will not run OOM. > Note that the case is very similar to table writes as well. When disabling > auto flush, we buffer a list of Put's to commit at once. That buffer is > measured in bytes, so that a small number of large Puts or a lot of small > Puts can each fit in a single flush. If that buffer were measured in number > of Put's it would have the same problem that we have for the scan buffer, and > we wouldn't be able to set a good default value for tables with different > size rows. Changing the scan buffer to be configured like the write buffer > will make it more consistent. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.