Ujjawal Kumar created HBASE-29104:
-------------------------------------

             Summary: Support reading partial rows via snapshot based MR job
                 Key: HBASE-29104
                 URL: https://issues.apache.org/jira/browse/HBASE-29104
             Project: HBase
          Issue Type: Improvement
          Components: snapshots
    Affects Versions: 2.5.10
            Reporter: Ujjawal Kumar


Reading larger rows (> hbase.table.max.rowsize) via snapshot based MR job can 
fail due to
org.apache.hadoop.hbase.regionserver.RowTooBigException. 

For such cases, one way to fix these is increasing value of 
hbase.table.max.rowsize via MR job config. However this can also cause OOM 
error within mapper in worst case. 

One way to fix this is to allow reading rows partially within the snapshot 
based MR jobs via usage of Scan#maxResultSize and Scan#allowPartialResults. 
This can't be used for snapshot based MR jobs due to the fact that 
ClientSideRegionScanner uses [default scanner context while 
reading|https://github.com/apache/hbase/blob/5201ae2de2b4b4d18156ab0c00dd42e7726951c0/hbase-server/src/main/java/org/apache/hadoop/hbase/client/ClientSideRegionScanner.java#L104]
 which can't enforce size based limits. 

Allowing user to pass a custom scanner context to enforce size limit (similar 
to the [one used within RSRPCServices 
|https://github.com/apache/hbase/blob/5201ae2de2b4b4d18156ab0c00dd42e7726951c0/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RSRpcServices.java#L3330-L3346]while
 reading via regionserver) for snapshot reads can be used to solve this. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to