[jira] [Commented] (HBASE-28595) Losing exception from scan RPC can lead to partial results

Duo Zhang (Jira) Thu, 16 May 2024 00:49:03 -0700


    [ 
https://issues.apache.org/jira/browse/HBASE-28595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17846860#comment-17846860
 ]


Duo Zhang commented on HBASE-28595:
-----------------------------------

The server side logic is for keep compatibility with some old clients, it is 
really a pain if new client needs for keeping compatiblity with this piece of 
code...

It does not make sense.

{quote}
1. last scan rpc arrives to server, rows are returned, 
more_results_in_region=false
2. the response is lost due to network issues and the scan rpc is retried
3. as the scanner was closed on the server side in 1., an empty result is 
returned
4. the client skips the rows returned in 1. and returns without an error
{quote}

We could also record the last callSeq in the closed scanners, so if the callSeq 
repeats, we should throw a UnknownScannerException or 
OutOfOrderScannerException.

If we still can not cover all the cases at server side, I would argue that we 
just revert the code in HBASE-18042, as it is just for keep compatibility with 
the thirdparty client in OpenTSDB. If we can not even make our official client 
correct, we do not need to consider thirdparty client then...

Thanks.

> Losing exception from scan RPC can lead to partial results
> ----------------------------------------------------------
>
>                 Key: HBASE-28595
>                 URL: https://issues.apache.org/jira/browse/HBASE-28595
>             Project: HBase
>          Issue Type: Bug
>          Components: Client, regionserver, Scanners
>            Reporter: Csaba Ringhofer
>            Assignee: Csaba Ringhofer
>            Priority: Critical
>              Labels: pull-request-available
>
> This was discovered in Apache Impala using HBase 2.2 based branch hbase 
> client and server. It is not clear yet whether other branches are also 
> affected.
> The issue happens if the server side of the scan throws an exception and 
> closes the scanner, but at the same time, the client gets an rpc connection 
> closed error and doesn't process the exception sent by the server. Client 
> then thinks it got a network error, which leads to retrying the RPC instead 
> of opening a new scanner. But then when the client retry reaches the server, 
> the server returns an empty ScanResponse instead of an error, leading to 
> closing the scanner on client side without returning any error.
> A few pointers to critical parts:
> region server:
> 1st call throws exception leading to closing (but not deleting) scanner:
> [https://github.com/apache/hbase/blob/0c8607a35008b7dca15e9daaec41ec362d159d67/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RSRpcServices.java#L3539]
> 2nd call (retry of 1st) returns empty results:
> [https://github.com/apache/hbase/blob/0c8607a35008b7dca15e9daaec41ec362d159d67/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RSRpcServices.java#L3403]
> client:
> some exceptions are handled as non-retriable at RPC level and are only 
> handled through opening a new scanner:
> [https://github.com/apache/hbase/blob/0c8607a35008b7dca15e9daaec41ec362d159d67/hbase-client/src/main/java/org/apache/hadoop/hbase/client/ScannerCallable.java#L214]
> [https://github.com/apache/hbase/blob/0c8607a35008b7dca15e9daaec41ec362d159d67/hbase-client/src/main/java/org/apache/hadoop/hbase/client/ClientScanner.java#L367]
> This mechanism in the client only works if it gets the exception from the 
> server. If there are connection issues during the RPC then the client won't 
> really know the state of the server.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (HBASE-28595) Losing exception from scan RPC can lead to partial results

Reply via email to