Re: Tip when scanning and spending a lot of time on each row

Lars George Sun, 19 Apr 2009 12:53:15 -0700

Hi J-D,

This is really important news it seems as we had quite a few of thoselately being reported with no apparent reason. Could you please add thisto the Wiki troubleshooting (or similar) page?


Regards,
Lars


Jean-Daniel Cryans wrote:

Hey list,

Just a small tip for those who uses the scanners in HBase and that
their processing time takes more than 2-3 seconds per row : lower the
hbase.client.scanner.caching. When I wrote that feature, my tests
showed my that a value of 30 gives the best speed VS memory
consumption. 80% of the time, that's what you need. In the case I
first described, you will very likely hit scanner timeouts (or
unknown). Why? Some simple maths :

Default lease time : 60 secs
Example row processing time : 3 secs
Scanner prefeching value : 30

That means that you will query 30 rows in a single batch in the first
next(), then you will take the 29 others directly from the client
cache, then you will re-query a region server for 30 more. Since 3*30
= 90 and that's > 60, you get a scanner timeout. In one case recently,
it was taking me more than 2 minutes per row (rss crawling) so
timeouts were inevitable.

You can set this value in hbase-site, a HBaseConfiguration object or
using HTable.setScannerCaching

J-D

Re: Tip when scanning and spending a lot of time on each row

Reply via email to