You can set it on a per-HTable basis. HTable.setScannerCaching(int);
On Fri, Nov 20, 2009 at 1:43 PM, Dave Latham <[email protected]> wrote: > I have some tables with large rows and some tables with very small rows, so > I keep my default scanner caching at 1 row, but have to remember to set it > higher when scanner tables with smaller rows. It would be nice to have a > default that did something reasonable across tables. > > Would it make sense to set scanner caching as a count of bytes rather than a > count of rows? That would make it similar to the write buffer for batches > of puts that get flushed based on size rather than a fixed number of Puts. > Then there could be some default value which should provide decent > performance out of the box. > > Dave > > On Fri, Nov 20, 2009 at 12:35 PM, Gary Helmling <[email protected]> wrote: > >> To set this per scan you should be able to do: >> >> Scan s = new Scan() >> s.setCaching(...) >> >> (I think this works anyway) >> >> >> The other thing that I've found useful is using a PageFilter on scans: >> >> http://hadoop.apache.org/hbase/docs/r0.20.2/api/org/apache/hadoop/hbase/filter/PageFilter.html >> >> I believe this is applied independently on each region server (?) so you >> still need to do your own counting in iterating the results, but it can be >> used to early out on the server side separately from the scanner caching >> value. >> >> --gh >> >> On Fri, Nov 20, 2009 at 3:04 PM, stack <[email protected]> wrote: >> >> > There is this in the configuration: >> > >> > <property> >> > <name>hbase.client.scanner.caching</name> >> > <value>1</value> >> > <description>Number of rows that will be fetched when calling next >> > on a scanner if it is not served from memory. Higher caching values >> > will enable faster scanners but will eat up more memory and some >> > calls of next may take longer and longer times when the cache is >> empty. >> > </description> >> > </property> >> > >> > >> > Being able to do it per Scan sounds like something we should add. >> > >> > St.Ack >> > >> > >> > On Fri, Nov 20, 2009 at 11:43 AM, Adam Silberstein >> > <[email protected]>wrote: >> > >> > > Hi, >> > > Is there a way to specify a limit on number of returned records for >> scan? >> > > I >> > > don¹t see any way to do this when building the scan. If there is, that >> > > would be great. If not, what about when iterating over the result? If >> I >> > > exit the loop when I reach my limit, will that approximate this clause? >> > I >> > > guess my real question is about how scan is implemented in the client. >> > > I.e. >> > > How many records are returned from Hbase at a time as I iterate through >> > the >> > > scan result? If I want 1,000 records and 100 get returned at a time, >> > then >> > > I¹m in good shape. On the other hand, if I want 10 records and get 100 >> > at >> > > a >> > > time, it¹s a bit wasteful, though the waste is bounded. >> > > >> > > Thanks, >> > > Adam >> > > >> > >> >
