[ 
https://issues.apache.org/jira/browse/HBASE-25108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Murilo Giacometti Rocha updated HBASE-25108:
--------------------------------------------
    Description: 
In the client, when the MutateRequest times out, we retry the operation in 
org.apache.hadoop.hbase.client.RpcRetryingCallerImpl#callWithRetries. If the 
server received the request but the client failed to get a response, the server 
returns processed=false  because the value is already there. So the value false 
is returned, even though the checkAndPut was successful in the first attempt. 
It should return processed=true if the value exists and it is a retry operation.

 

Example RpcRetryingCallerImpl inside checkAndPut:

                       request

client  --------{-}o{-}---------> server

  

client  --------------------> server (processing)

 

client  --------------------> server (processed)

(timed out)

 

                      retry request

client  --------{-}o{-}---------> server

  

client  --------------------> server (processing)

 

client   ------------------> server (already processed)

 

                     response (processed = false)

client  <------{-}o{-}-------- server

 

checkAndPut returns false, even though it's successful.

 

In 2.1.0, I could only reproduce it three times by accident, but it always 
happened in 1.2.11. In 2.1.0, I could only reproduce it systematically by 
cleaning the response with the debugger before it got to the hconnection thread.

Repro steps
 * Create a breakpoint in the exception in 
org.apache.hadoop.hbase.client.RpcRetryingCallerImpl#callWithRetries to make 
sure we get the exception and retry.
 * Create an infinite loop to create different rows with checkAndPut.
 * Start running with the disabled breakpoints.
 * Enable the breakpoint.
 * Pause all threads and verify that we are waiting for a response in the IPC 
thread. Wait for 1-2 minutes. This will cause a timeout.

 * Continue and verify that an exception is triggered.
 * Add a breakpoint to verify the response.
 * Continue and check the response and the returned value.

  was:
In the client, when the MutateRequest times out, we retry the operation in 
org.apache.hadoop.hbase.client.RpcRetryingCallerImpl#callWithRetries. If the 
server received the request but the client failed to get a response, the server 
returns processed=false  because the value is already there. So the value false 
is returned, even though the checkAndPut was successful in the first attempt. 
It should return processed=true if the value exists and it is a retry operation.

 

Example:

 

                       request

client  ---------o----------> server

   

client  --------------------> server (processing)

 

client  --------------------> server (processed)

(timed out)

 

                      retry request

client  ---------o----------> server

  

client  --------------------> server (processing)

 

client   ------------------> server (already processed)

 

                     response (processed = false)

client  <-------o--------- server

 

checkAndPut returns false, even though it's successful.

 

In 2.1.0, I could only reproduce it three times by accident, but it always 
happened in 1.2.11. In 2.1.0, I could only reproduce it systematically by 
cleaning the response with the debugger before it got to the hconnection thread.

Repro steps
 * Create a breakpoint in the exception in 
org.apache.hadoop.hbase.client.RpcRetryingCallerImpl#callWithRetries to make 
sure we get the exception and retry.
 * Create an infinite loop to create different rows with checkAndPut.
 * Start running with the disabled breakpoints.
 * Enable the breakpoint.
 * Pause all threads and verify that we are waiting for a response in the IPC 
thread. Wait for 1-2 minutes. This will cause a timeout.

 * Continue and verify that an exception is triggered.
 * Add a breakpoint to verify the response.
 * Continue and check the response and the returned value.


> checkAndPut (or checkAndMutate) might return false when the row is mutated 
> successfully
> ---------------------------------------------------------------------------------------
>
>                 Key: HBASE-25108
>                 URL: https://issues.apache.org/jira/browse/HBASE-25108
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 2.1.0, 1.2.11
>            Reporter: Murilo Giacometti Rocha
>            Priority: Major
>
> In the client, when the MutateRequest times out, we retry the operation in 
> org.apache.hadoop.hbase.client.RpcRetryingCallerImpl#callWithRetries. If the 
> server received the request but the client failed to get a response, the 
> server returns processed=false  because the value is already there. So the 
> value false is returned, even though the checkAndPut was successful in the 
> first attempt. It should return processed=true if the value exists and it is 
> a retry operation.
>  
> Example RpcRetryingCallerImpl inside checkAndPut:
>                        request
> client  --------{-}o{-}---------> server
>   
> client  --------------------> server (processing)
>  
> client  --------------------> server (processed)
> (timed out)
>  
>                       retry request
> client  --------{-}o{-}---------> server
>   
> client  --------------------> server (processing)
>  
> client   ------------------> server (already processed)
>  
>                      response (processed = false)
> client  <------{-}o{-}-------- server
>  
> checkAndPut returns false, even though it's successful.
>  
> In 2.1.0, I could only reproduce it three times by accident, but it always 
> happened in 1.2.11. In 2.1.0, I could only reproduce it systematically by 
> cleaning the response with the debugger before it got to the hconnection 
> thread.
> Repro steps
>  * Create a breakpoint in the exception in 
> org.apache.hadoop.hbase.client.RpcRetryingCallerImpl#callWithRetries to make 
> sure we get the exception and retry.
>  * Create an infinite loop to create different rows with checkAndPut.
>  * Start running with the disabled breakpoints.
>  * Enable the breakpoint.
>  * Pause all threads and verify that we are waiting for a response in the IPC 
> thread. Wait for 1-2 minutes. This will cause a timeout.
>  * Continue and verify that an exception is triggered.
>  * Add a breakpoint to verify the response.
>  * Continue and check the response and the returned value.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to