Bryan Beaudreault created HBASE-26997:
-----------------------------------------

             Summary: Auto renew scanner lease in TableRecordReader
                 Key: HBASE-26997
                 URL: https://issues.apache.org/jira/browse/HBASE-26997
             Project: HBase
          Issue Type: New Feature
            Reporter: Bryan Beaudreault
            Assignee: Bryan Beaudreault


A common problem with hadoop jobs is when the mapper takes too long to process 
individual inputs. This is especially problematic with TableInputFormat because 
if you don't process a scanner.next() batch within the scanner timeout period 
your job will fail with UnknownScannerException.

The fix here is usually to reduce Scan.setCaching, so that fewer rows are 
returned within each batch. This isn't always a great solution because maybe 
not all batches are uniform in their processing time, or maybe even processing 
a single row (the smallest caching size) might take a while.

We can improve this for users by providing a configurable period at which the 
TableRecordReader will automatically call scanner.renewLease() unless next() 
was recently called.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

Reply via email to