Bryan Beaudreault created HBASE-26997:
-----------------------------------------
Summary: Auto renew scanner lease in TableRecordReader
Key: HBASE-26997
URL: https://issues.apache.org/jira/browse/HBASE-26997
Project: HBase
Issue Type: New Feature
Reporter: Bryan Beaudreault
Assignee: Bryan Beaudreault
A common problem with hadoop jobs is when the mapper takes too long to process
individual inputs. This is especially problematic with TableInputFormat because
if you don't process a scanner.next() batch within the scanner timeout period
your job will fail with UnknownScannerException.
The fix here is usually to reduce Scan.setCaching, so that fewer rows are
returned within each batch. This isn't always a great solution because maybe
not all batches are uniform in their processing time, or maybe even processing
a single row (the smallest caching size) might take a while.
We can improve this for users by providing a configurable period at which the
TableRecordReader will automatically call scanner.renewLease() unless next()
was recently called.
--
This message was sent by Atlassian Jira
(v8.20.7#820007)