[
https://issues.apache.org/jira/browse/HBASE-25108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Murilo Giacometti Rocha updated HBASE-25108:
--------------------------------------------
Description:
In the client, when the MutateRequest times out, we retry the operation in
org.apache.hadoop.hbase.client.RpcRetryingCallerImpl#callWithRetries. If the
server received the request but the client failed to get a response, the server
returns processed=false because the value is already there. So the value false
is returned, even though the checkAndPut was successful in the first attempt.
It should return processed=true if the value exists and it is a retry operation.
Example RpcRetryingCallerImpl inside checkAndPut:
request
client --------{-}o{-}---------> server
client --------------------> server (processing)
client --------------------> server (processed)
(timed out)
retry request
client --------{-}o{-}---------> server
client --------------------> server (processing)
client ------------------> server (already processed)
response (processed = false)
client <------{-}o{-}-------- server
checkAndPut returns false, even though it's successful.
In 2.1.0, I could only reproduce it three times by accident, but it always
happened in 1.2.11. In 2.1.0, I could only reproduce it systematically by
cleaning the response with the debugger before it got to the hconnection thread.
Repro steps
* Create a breakpoint in the exception in
org.apache.hadoop.hbase.client.RpcRetryingCallerImpl#callWithRetries to make
sure we get the exception and retry.
* Create an infinite loop to create different rows with checkAndPut.
* Start running with the disabled breakpoints.
* Enable the breakpoint.
* Pause all threads and verify that we are waiting for a response in the IPC
thread. Wait for 1-2 minutes. This will cause a timeout.
* Continue and verify that an exception is triggered.
* Add a breakpoint to verify the response.
* Continue and check the response and the returned value.
was:
In the client, when the MutateRequest times out, we retry the operation in
org.apache.hadoop.hbase.client.RpcRetryingCallerImpl#callWithRetries. If the
server received the request but the client failed to get a response, the server
returns processed=false because the value is already there. So the value false
is returned, even though the checkAndPut was successful in the first attempt.
It should return processed=true if the value exists and it is a retry operation.
Example:
request
client ---------o----------> server
client --------------------> server (processing)
client --------------------> server (processed)
(timed out)
retry request
client ---------o----------> server
client --------------------> server (processing)
client ------------------> server (already processed)
response (processed = false)
client <-------o--------- server
checkAndPut returns false, even though it's successful.
In 2.1.0, I could only reproduce it three times by accident, but it always
happened in 1.2.11. In 2.1.0, I could only reproduce it systematically by
cleaning the response with the debugger before it got to the hconnection thread.
Repro steps
* Create a breakpoint in the exception in
org.apache.hadoop.hbase.client.RpcRetryingCallerImpl#callWithRetries to make
sure we get the exception and retry.
* Create an infinite loop to create different rows with checkAndPut.
* Start running with the disabled breakpoints.
* Enable the breakpoint.
* Pause all threads and verify that we are waiting for a response in the IPC
thread. Wait for 1-2 minutes. This will cause a timeout.
* Continue and verify that an exception is triggered.
* Add a breakpoint to verify the response.
* Continue and check the response and the returned value.
> checkAndPut (or checkAndMutate) might return false when the row is mutated
> successfully
> ---------------------------------------------------------------------------------------
>
> Key: HBASE-25108
> URL: https://issues.apache.org/jira/browse/HBASE-25108
> Project: HBase
> Issue Type: Bug
> Affects Versions: 2.1.0, 1.2.11
> Reporter: Murilo Giacometti Rocha
> Priority: Major
>
> In the client, when the MutateRequest times out, we retry the operation in
> org.apache.hadoop.hbase.client.RpcRetryingCallerImpl#callWithRetries. If the
> server received the request but the client failed to get a response, the
> server returns processed=false because the value is already there. So the
> value false is returned, even though the checkAndPut was successful in the
> first attempt. It should return processed=true if the value exists and it is
> a retry operation.
>
> Example RpcRetryingCallerImpl inside checkAndPut:
> request
> client --------{-}o{-}---------> server
>
> client --------------------> server (processing)
>
> client --------------------> server (processed)
> (timed out)
>
> retry request
> client --------{-}o{-}---------> server
>
> client --------------------> server (processing)
>
> client ------------------> server (already processed)
>
> response (processed = false)
> client <------{-}o{-}-------- server
>
> checkAndPut returns false, even though it's successful.
>
> In 2.1.0, I could only reproduce it three times by accident, but it always
> happened in 1.2.11. In 2.1.0, I could only reproduce it systematically by
> cleaning the response with the debugger before it got to the hconnection
> thread.
> Repro steps
> * Create a breakpoint in the exception in
> org.apache.hadoop.hbase.client.RpcRetryingCallerImpl#callWithRetries to make
> sure we get the exception and retry.
> * Create an infinite loop to create different rows with checkAndPut.
> * Start running with the disabled breakpoints.
> * Enable the breakpoint.
> * Pause all threads and verify that we are waiting for a response in the IPC
> thread. Wait for 1-2 minutes. This will cause a timeout.
> * Continue and verify that an exception is triggered.
> * Add a breakpoint to verify the response.
> * Continue and check the response and the returned value.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)