Daniel Roudnitsky created HBASE-29470:
-----------------------------------------

             Summary: Client swallows interrupts during location resolution
                 Key: HBASE-29470
                 URL: https://issues.apache.org/jira/browse/HBASE-29470
             Project: HBase
          Issue Type: Bug
          Components: Client
    Affects Versions: 2.5.12, 2.6.3
            Reporter: Daniel Roudnitsky
            Assignee: Daniel Roudnitsky


+Problem+
With batch requests with the 2.x sync client, the client will swallow 
interrupts that are sent during region location resolution.

Sync client will sequentially resolve the region location of each action in a 
batch request, and if an interrupt signal is sent during this process, the 
client swallows the interrupt and considers it as a location error for whatever 
action location was being resolved at the time of the interrupt, and then the 
client will continue with location resolution for the remaining actions in the 
batch and will then execute the remaining actions.

Once the client completes processing the rest of the batch request (however 
long that takes), it will ultimately throw a 
RetriesExhaustedWithDetailsException since we could not execute the action 
which was being processed at the time of the interrupt, but all the remaining 
actions in the batch will have been processed and returned a result.

For example a batch call with 10 actions which is interrupted ~immediately 
after execution started will not return ~immediately on interrupt, will run for 
however long it takes to process the latter 9 actions, and will ultimately 
result in 1 interrupted exception and 9 successful results/actions. 

+Root cause and solution+
In locateRegionInMeta where the meta lookup happens [we rethrow 
InterruptedException as an 
IOException|https://github.com/apache/hbase/blob/d79070d14054195ab38644b3d5c9332073c47455/hbase-client/src/main/java/org/apache/hadoop/hbase/client/ConnectionImplementation.java#L1124],
 and in [findAllLocationsOrFail we will treat any IOException as a location 
error|https://github.com/apache/hbase/blob/d79070d14054195ab38644b3d5c9332073c47455/hbase-client/src/main/java/org/apache/hadoop/hbase/client/AsyncRequestFutureImpl.java#L555-L557],
 set the error for the action that was being processed, and then 
[groupAndSendMulti will proceed as 
usual|https://github.com/apache/hbase/blob/d79070d14054195ab38644b3d5c9332073c47455/hbase-client/src/main/java/org/apache/hadoop/hbase/client/AsyncRequestFutureImpl.java#L460-L467]
 and continue to process the rest of the batch. We need special handling for 
interrupted exception in groupAndSendMulti to fast fail the entire batch with 
InterruptedIOException.   



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to