Ujjawal Kumar created HBASE-29104: ------------------------------------- Summary: Support reading partial rows via snapshot based MR job Key: HBASE-29104 URL: https://issues.apache.org/jira/browse/HBASE-29104 Project: HBase Issue Type: Improvement Components: snapshots Affects Versions: 2.5.10 Reporter: Ujjawal Kumar
Reading larger rows (> hbase.table.max.rowsize) via snapshot based MR job can fail due to org.apache.hadoop.hbase.regionserver.RowTooBigException. For such cases, one way to fix these is increasing value of hbase.table.max.rowsize via MR job config. However this can also cause OOM error within mapper in worst case. One way to fix this is to allow reading rows partially within the snapshot based MR jobs via usage of Scan#maxResultSize and Scan#allowPartialResults. This can't be used for snapshot based MR jobs due to the fact that ClientSideRegionScanner uses [default scanner context while reading|https://github.com/apache/hbase/blob/5201ae2de2b4b4d18156ab0c00dd42e7726951c0/hbase-server/src/main/java/org/apache/hadoop/hbase/client/ClientSideRegionScanner.java#L104] which can't enforce size based limits. Allowing user to pass a custom scanner context to enforce size limit (similar to the [one used within RSRPCServices |https://github.com/apache/hbase/blob/5201ae2de2b4b4d18156ab0c00dd42e7726951c0/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RSRpcServices.java#L3330-L3346]while reading via regionserver) for snapshot reads can be used to solve this. -- This message was sent by Atlassian Jira (v8.20.10#820010)