I have some tables with large rows and some tables with very small rows, so I keep my default scanner caching at 1 row, but have to remember to set it higher when scanner tables with smaller rows. It would be nice to have a default that did something reasonable across tables.
Would it make sense to set scanner caching as a count of bytes rather than a count of rows? That would make it similar to the write buffer for batches of puts that get flushed based on size rather than a fixed number of Puts. Then there could be some default value which should provide decent performance out of the box. Dave On Fri, Nov 20, 2009 at 12:35 PM, Gary Helmling <[email protected]> wrote: > To set this per scan you should be able to do: > > Scan s = new Scan() > s.setCaching(...) > > (I think this works anyway) > > > The other thing that I've found useful is using a PageFilter on scans: > > http://hadoop.apache.org/hbase/docs/r0.20.2/api/org/apache/hadoop/hbase/filter/PageFilter.html > > I believe this is applied independently on each region server (?) so you > still need to do your own counting in iterating the results, but it can be > used to early out on the server side separately from the scanner caching > value. > > --gh > > On Fri, Nov 20, 2009 at 3:04 PM, stack <[email protected]> wrote: > > > There is this in the configuration: > > > > <property> > > <name>hbase.client.scanner.caching</name> > > <value>1</value> > > <description>Number of rows that will be fetched when calling next > > on a scanner if it is not served from memory. Higher caching values > > will enable faster scanners but will eat up more memory and some > > calls of next may take longer and longer times when the cache is > empty. > > </description> > > </property> > > > > > > Being able to do it per Scan sounds like something we should add. > > > > St.Ack > > > > > > On Fri, Nov 20, 2009 at 11:43 AM, Adam Silberstein > > <[email protected]>wrote: > > > > > Hi, > > > Is there a way to specify a limit on number of returned records for > scan? > > > I > > > don¹t see any way to do this when building the scan. If there is, that > > > would be great. If not, what about when iterating over the result? If > I > > > exit the loop when I reach my limit, will that approximate this clause? > > I > > > guess my real question is about how scan is implemented in the client. > > > I.e. > > > How many records are returned from Hbase at a time as I iterate through > > the > > > scan result? If I want 1,000 records and 100 get returned at a time, > > then > > > I¹m in good shape. On the other hand, if I want 10 records and get 100 > > at > > > a > > > time, it¹s a bit wasteful, though the waste is bounded. > > > > > > Thanks, > > > Adam > > > > > >
