[ 
https://issues.apache.org/jira/browse/HBASE-20618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16815712#comment-16815712
 ] 

Swapna commented on HBASE-20618:
--------------------------------

This is done specific to our use case. We can get rid of server side filtering 
on a CF(large) and take benefit of JoinedScanner if we have some way to handle 
big rows  on server side.

In order to generalize this optimization to list of filters with MUST_PASS_ALL, 
filter api’s need to be modified and involves big effort.

Would love to hear if that will be useful for the community. 

Waiting to hear some suggestions. Will be happy to incorporate the changes. 
Otherwise can be closed if this is not useful for many users.Thanks.

> Skip large rows instead of throwing an exception to client
> ----------------------------------------------------------
>
>                 Key: HBASE-20618
>                 URL: https://issues.apache.org/jira/browse/HBASE-20618
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Swapna
>            Priority: Minor
>             Fix For: 3.0.0, 1.5.0, 2.3.0
>
>         Attachments: HBASE-20618.hbasemaster.v01.patch, 
> HBASE-20618.hbasemaster.v02.patch, HBASE-20618.v1.branch-1.patch, 
> HBASE-20618.v1.branch-1.patch
>
>
> Currently HBase supports throwing RowTooBigException incase there is a row 
> with one of the column family data exceeds the configured maximum
> https://issues.apache.org/jira/browse/HBASE-10925?attachmentOrder=desc
> We have some bad rows growing very large. We need a way to skip these rows 
> for most of our jobs.
> Some of the options we considered:
> Option 1:
> Hbase client handle the exception and restart the scanner past bad row by 
> capturing the row key where it failed. Can be by adding the rowkey to the 
> exception stack trace, which seems brittle. Client would ignore the setting 
> if its upgraded before server.
> Option 2:
> Skip through big rows on Server.Go with server level config similar to 
> "hbase.table.max.rowsize" or request based by changing the scan request api. 
> If allowed to do per request, based on the scan request config, Client will 
> have to ignore the setting if its upgraded before server.
> {code}
> try {
>  populateResult(results, this.storeHeap, scannerContext, current);
>  } catch(RowTooBigException e) {
>  LOG.info("Row exceeded the limit in storeheap. Skipping row with 
> key:"+Bytes.toString(current.getRowArray()));
>  this.storeHeap.reseek(PrivateCellUtil.createLastOnRow(current));
>  results.clear();
>  scannerContext.clearProgress();
>  continue;
>  }
> {code}
> Prefer the option 2 with server level config. Please share your inputs



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to