Hi J-D,
This is really important news it seems as we had quite a few of those
lately being reported with no apparent reason. Could you please add this
to the Wiki troubleshooting (or similar) page?
Regards,
Lars
Jean-Daniel Cryans wrote:
Hey list,
Just a small tip for those who uses the scanners in HBase and that
their processing time takes more than 2-3 seconds per row : lower the
hbase.client.scanner.caching. When I wrote that feature, my tests
showed my that a value of 30 gives the best speed VS memory
consumption. 80% of the time, that's what you need. In the case I
first described, you will very likely hit scanner timeouts (or
unknown). Why? Some simple maths :
Default lease time : 60 secs
Example row processing time : 3 secs
Scanner prefeching value : 30
That means that you will query 30 rows in a single batch in the first
next(), then you will take the 29 others directly from the client
cache, then you will re-query a region server for 30 more. Since 3*30
= 90 and that's > 60, you get a scanner timeout. In one case recently,
it was taking me more than 2 minutes per row (rss crawling) so
timeouts were inevitable.
You can set this value in hbase-site, a HBaseConfiguration object or
using HTable.setScannerCaching
J-D