Re: Scanner timeout -- any reason not to raise?

2013-03-21 Thread Alok Singh
Dan, One of the ways we get around the scanner timeouts is to keep track of the last row that was read and restart the scan from that row. -- boolean scanComplete = false; while (!scanComplete){ long lastFetchTs = 0; scanner = table.getScanner(scan); Result

Re: Scanner timeout -- any reason not to raise?

2013-03-20 Thread Dan Crosta
I'm confused -- I only see one setting in CDH manager, what is the name of the other setting? Our load is moderately frequent small writes (in batches of 1000 cells at a time, typically split over a few hundred rows -- these complete very fast, we haven't seen any timeouts there), and

Re: Scanner timeout -- any reason not to raise?

2013-03-20 Thread Ted Yu
In 0.94, there is only one setting. See release notes of HBASE-6170 which is in 0.95 Looks like this should help (in 0.95): https://issues.apache.org/jira/browse/HBASE-2214 Do HBASE-1996 -- setting size to return in scan rather than count of rows -- properly From your description, you should be

Re: Scanner timeout -- any reason not to raise?

2013-03-20 Thread Bryan Beaudreault
Typically it is better to use caching and batch size to limit the number of rows returned and thus the amount of processing required between calls to next() during a scan, but it would be nice if HBase provided a way to manually refresh a lease similar to Hadoop's context.progress(). In a cluster

Re: Scanner timeout -- any reason not to raise?

2013-03-20 Thread Ted Yu
bq. if HBase provided a way to manually refresh a lease similar to Hadoop's context.progress() Can you outline how the above works for long scan ? bq. Even being able to override the timeout on a per-scan basis would be nice. Agreed. On Wed, Mar 20, 2013 at 10:05 AM, Bryan Beaudreault

Re: Scanner timeout -- any reason not to raise?

2013-03-20 Thread Bryan Beaudreault
I was thinking something like this: Scan scan = new Scan(startRow, endRow); scan.setCaching(someVal); // based on what we expect most rows to take for processing time ResultScanner scanner = table.getScanner(scan); for (Result r : scanner) { // usual processing, the time for which we

Re: Scanner timeout -- any reason not to raise?

2013-03-20 Thread Ted Yu
Bryan: Interesting idea. You can log a JIRA with the following two suggestions. On Wed, Mar 20, 2013 at 10:39 AM, Bryan Beaudreault bbeaudrea...@hubspot.com wrote: I was thinking something like this: Scan scan = new Scan(startRow, endRow); scan.setCaching(someVal); // based on what we

Re: Scanner timeout -- any reason not to raise?

2013-03-20 Thread Bryan Beaudreault
Thanks Ted, I've submitted https://issues.apache.org/jira/browse/HBASE-8157. On Wed, Mar 20, 2013 at 1:56 PM, Ted Yu yuzhih...@gmail.com wrote: Bryan: Interesting idea. You can log a JIRA with the following two suggestions. On Wed, Mar 20, 2013 at 10:39 AM, Bryan Beaudreault

Scanner timeout -- any reason not to raise?

2013-03-17 Thread Dan Crosta
We occasionally get scanner timeout errors such as 66698ms passed since the last invocation, timeout is currently set to 6 when iterating a scanner through the Thrift API. Is there any reason not to raise the timeout to something larger than the default 60s? Put another way, what resources

Re: Scanner timeout -- any reason not to raise?

2013-03-17 Thread Ted Yu
Which HBase version are you using ? In 0.94 and prior, the config param is hbase.regionserver.lease.period In 0.95, it is different. See release notes of HBASE-6170 On Sun, Mar 17, 2013 at 11:46 AM, Dan Crosta d...@magnetic.com wrote: We occasionally get scanner timeout errors such as 66698ms

Re: Scanner timeout -- any reason not to raise?

2013-03-17 Thread Dan Crosta
Ah, thanks Ted -- I was wondering what that setting was for. We are using CDH 4.2.0, which is HBase 0.94.2 (give or take a few backports from 0.94.3). Is there any harm in setting the lease timeout to something larger, like 5 or 10 minutes? Thanks, - Dan On Mar 17, 2013, at 1:46 PM, Ted Yu

Re: Scanner timeout -- any reason not to raise?

2013-03-17 Thread Ted Yu
The lease timeout is used by row locking too. That's the reason behind splitting the setting into two config parameters. How is your load composition ? Do you mostly serve reads from HBase ? Cheers On Sun, Mar 17, 2013 at 1:56 PM, Dan Crosta d...@magnetic.com wrote: Ah, thanks Ted -- I was