Lars,

Good idea. It's now on the troubleshooting page and
hbase.client.scanner.caching is set to 1 by default in trunk.

J-D

On Sun, Apr 19, 2009 at 3:52 PM, Lars George <[email protected]> wrote:
> Hi J-D,
>
> This is really important news it seems as we had quite a few of those lately
> being reported with no apparent reason. Could you please add this to the
> Wiki troubleshooting (or similar) page?
>
> Regards,
> Lars
>
>
> Jean-Daniel Cryans wrote:
>>
>> Hey list,
>>
>> Just a small tip for those who uses the scanners in HBase and that
>> their processing time takes more than 2-3 seconds per row : lower the
>> hbase.client.scanner.caching. When I wrote that feature, my tests
>> showed my that a value of 30 gives the best speed VS memory
>> consumption. 80% of the time, that's what you need. In the case I
>> first described, you will very likely hit scanner timeouts (or
>> unknown). Why? Some simple maths :
>>
>> Default lease time : 60 secs
>> Example row processing time : 3 secs
>> Scanner prefeching value : 30
>>
>> That means that you will query 30 rows in a single batch in the first
>> next(), then you will take the 29 others directly from the client
>> cache, then you will re-query a region server for 30 more. Since 3*30
>> = 90 and that's > 60, you get a scanner timeout. In one case recently,
>> it was taking me more than 2 minutes per row (rss crawling) so
>> timeouts were inevitable.
>>
>> You can set this value in hbase-site, a HBaseConfiguration object or
>> using HTable.setScannerCaching
>>
>> J-D
>>
>>
>

Reply via email to