Re: Record limit in scan api?

Dave Latham Fri, 20 Nov 2009 13:44:05 -0800

I have some tables with large rows and some tables with very small rows, so
I keep my default scanner caching at 1 row, but have to remember to set it
higher when scanner tables with smaller rows.  It would be nice to have a
default that did something reasonable across tables.


Would it make sense to set scanner caching as a count of bytes rather than a
count of rows?  That would make it similar to the write buffer for batches
of puts that get flushed based on size rather than a fixed number of Puts.
Then there could be some default value which should provide decent
performance out of the box.

Dave

On Fri, Nov 20, 2009 at 12:35 PM, Gary Helmling <[email protected]> wrote:

> To set this per scan you should be able to do:
>
> Scan s = new Scan()
> s.setCaching(...)
>
> (I think this works anyway)
>
>
> The other thing that I've found useful is using a PageFilter on scans:
>
> http://hadoop.apache.org/hbase/docs/r0.20.2/api/org/apache/hadoop/hbase/filter/PageFilter.html
>
> I believe this is applied independently on each region server (?) so you
> still need to do your own counting in iterating the results, but it can be
> used to early out on the server side separately from the scanner caching
> value.
>
> --gh
>
> On Fri, Nov 20, 2009 at 3:04 PM, stack <[email protected]> wrote:
>
> > There is this in the configuration:
> >
> >  <property>
> >    <name>hbase.client.scanner.caching</name>
> >    <value>1</value>
> >    <description>Number of rows that will be fetched when calling next
> >    on a scanner if it is not served from memory. Higher caching values
> >    will enable faster scanners but will eat up more memory and some
> >    calls of next may take longer and longer times when the cache is
> empty.
> >    </description>
> >  </property>
> >
> >
> > Being able to do it per Scan sounds like something we should add.
> >
> > St.Ack
> >
> >
> > On Fri, Nov 20, 2009 at 11:43 AM, Adam Silberstein
> > <[email protected]>wrote:
> >
> > >   Hi,
> > > Is there a way to specify a limit on number of returned records for
> scan?
> > >  I
> > > don¹t see any way to do this when building the scan.  If there is, that
> > > would be great.  If not, what about when iterating over the result?  If
> I
> > > exit the loop when I reach my limit, will that approximate this clause?
> > I
> > > guess my real question is about how scan is implemented in the client.
> > >  I.e.
> > > How many records are returned from Hbase at a time as I iterate through
> > the
> > > scan result?  If I want 1,000 records and 100 get returned at a time,
> > then
> > > I¹m in good shape.  On the other hand, if I want 10 records and get 100
> > at
> > > a
> > > time, it¹s a bit wasteful, though the waste is bounded.
> > >
> > > Thanks,
> > > Adam
> > >
> >
>

Re: Record limit in scan api?

Reply via email to