Swapna created HBASE-20618:
------------------------------

             Summary: Skip large rows instead of throwing an exception to client
                 Key: HBASE-20618
                 URL: https://issues.apache.org/jira/browse/HBASE-20618
             Project: HBase
          Issue Type: New Feature
    Affects Versions: 3.0.0
            Reporter: Swapna


Currently HBase supports throwing RowTooBigException incase there is a row with 
one of the column family data exceeds the configured maximum
https://issues.apache.org/jira/browse/HBASE-10925?attachmentOrder=desc
We have some bad rows growing very large. We need a way to skip these rows for 
most of our jobs.

Some of the options we considered:
Option 1:
Hbase client handle the exception and restart the scanner past bad row by 
capturing the row key where it failed. Can be by adding the rowkey to the 
exception stack trace, which seems brittle. Client would ignore the setting if 
its upgraded before server.

Option 2:
Skip through big rows on Server.Go with server level config similar to 
"hbase.table.max.rowsize" or request based by changing the scan request api. If 
allowed to do per request, based on the scan request config, Client will have 
to ignore the setting if its upgraded before server.
{code}
try {
 populateResult(results, this.storeHeap, scannerContext, current);
 } catch(RowTooBigException e) {
 LOG.info("Row exceeded the limit in storeheap. Skipping row with 
key:"+Bytes.toString(current.getRowArray()));
 this.storeHeap.reseek(PrivateCellUtil.createLastOnRow(current));
 results.clear();
 scannerContext.clearProgress();
 continue;
 }
{code}


Prefer the option 2 with server level config. Please share your inputs



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to