Lars, Good idea. It's now on the troubleshooting page and hbase.client.scanner.caching is set to 1 by default in trunk.
J-D On Sun, Apr 19, 2009 at 3:52 PM, Lars George <[email protected]> wrote: > Hi J-D, > > This is really important news it seems as we had quite a few of those lately > being reported with no apparent reason. Could you please add this to the > Wiki troubleshooting (or similar) page? > > Regards, > Lars > > > Jean-Daniel Cryans wrote: >> >> Hey list, >> >> Just a small tip for those who uses the scanners in HBase and that >> their processing time takes more than 2-3 seconds per row : lower the >> hbase.client.scanner.caching. When I wrote that feature, my tests >> showed my that a value of 30 gives the best speed VS memory >> consumption. 80% of the time, that's what you need. In the case I >> first described, you will very likely hit scanner timeouts (or >> unknown). Why? Some simple maths : >> >> Default lease time : 60 secs >> Example row processing time : 3 secs >> Scanner prefeching value : 30 >> >> That means that you will query 30 rows in a single batch in the first >> next(), then you will take the 29 others directly from the client >> cache, then you will re-query a region server for 30 more. Since 3*30 >> = 90 and that's > 60, you get a scanner timeout. In one case recently, >> it was taking me more than 2 minutes per row (rss crawling) so >> timeouts were inevitable. >> >> You can set this value in hbase-site, a HBaseConfiguration object or >> using HTable.setScannerCaching >> >> J-D >> >> >
