[jira] [Commented] (HBASE-20618) Skip large rows instead of throwing an exception to client

Josh Elser (JIRA) Thu, 07 Jun 2018 08:30:08 -0700


    [ 
https://issues.apache.org/jira/browse/HBASE-20618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16504816#comment-16504816
 ]


Josh Elser commented on HBASE-20618:
------------------------------------

{quote}This seems like the wrong way to go about this. HBase has always been 
about strong consistency. We fail things rather than return the fastest easiest 
answer. That seems like the pattern we should take.
{quote}
Yeah, +1 to this. This seems like the wrong thing to encourage.
{quote}If a row is too big then we already provide the ability to allow partial 
results that can facilitate reading rows too large to send in one rpc.
{quote}
 
{quote}we have a server side filter with hasFilterRow set to true. We drop 
results based on some cells missing for a row. And this is incompatible with 
partial results as row boundaries are not known.
{quote}
So, the real problem is that your custom server-side filter can't work in 
conjunction with the existing functionality to chunk up a row? Shouldn't the 
fix be around that?

> Skip large rows instead of throwing an exception to client
> ----------------------------------------------------------
>
>                 Key: HBASE-20618
>                 URL: https://issues.apache.org/jira/browse/HBASE-20618
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Swapna
>            Priority: Minor
>             Fix For: 3.0.0, 2.0.1, 1.4.6
>
>         Attachments: HBASE-20618.hbasemaster.v01.patch, 
> HBASE-20618.hbasemaster.v02.patch, HBASE-20618.v1.branch-1.patch, 
> HBASE-20618.v1.branch-1.patch
>
>
> Currently HBase supports throwing RowTooBigException incase there is a row 
> with one of the column family data exceeds the configured maximum
> https://issues.apache.org/jira/browse/HBASE-10925?attachmentOrder=desc
> We have some bad rows growing very large. We need a way to skip these rows 
> for most of our jobs.
> Some of the options we considered:
> Option 1:
> Hbase client handle the exception and restart the scanner past bad row by 
> capturing the row key where it failed. Can be by adding the rowkey to the 
> exception stack trace, which seems brittle. Client would ignore the setting 
> if its upgraded before server.
> Option 2:
> Skip through big rows on Server.Go with server level config similar to 
> "hbase.table.max.rowsize" or request based by changing the scan request api. 
> If allowed to do per request, based on the scan request config, Client will 
> have to ignore the setting if its upgraded before server.
> {code}
> try {
>  populateResult(results, this.storeHeap, scannerContext, current);
>  } catch(RowTooBigException e) {
>  LOG.info("Row exceeded the limit in storeheap. Skipping row with 
> key:"+Bytes.toString(current.getRowArray()));
>  this.storeHeap.reseek(PrivateCellUtil.createLastOnRow(current));
>  results.clear();
>  scannerContext.clearProgress();
>  continue;
>  }
> {code}
> Prefer the option 2 with server level config. Please share your inputs



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HBASE-20618) Skip large rows instead of throwing an exception to client

Reply via email to