[jira] [Commented] (GEODE-697) A client thread timing out an operation and performing further operations can result in cache inconsistency

Hitesh Khamesra (JIRA) Sat, 19 Mar 2016 02:32:49 -0700

    [ 
https://issues.apache.org/jira/browse/GEODE-697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15200355#comment-15200355
 ]


Hitesh Khamesra commented on GEODE-697:
---------------------------------------

>>If we ignore the event ID on the secondary we will have more inconsistencies. 
>>If a client did put(x,a) and failed over to another server during the 
>>operation but succeeded in finishing it and then did a put(x,b) it's entirely 
>>possible for the put(x,a) to still be in transit to the secondary from the 
>>original attempt.

For same key-x this should not be problem as we take lock entry-x on primary. 
But for different key it will work no??

>>Ignoring the event ID on the server cache won't stop it from being rejected 
>>by client queues, either.
This can be problem and client queues may miss the event but in my opinion 
atleast cache will remain in consistent state..

> A client thread timing out an operation and performing further operations can 
> result in cache inconsistency
> -----------------------------------------------------------------------------------------------------------
>
>                 Key: GEODE-697
>                 URL: https://issues.apache.org/jira/browse/GEODE-697
>             Project: Geode
>          Issue Type: Bug
>            Reporter: Dan Smith
>            Assignee: Bruce Schuchardt
>
> There is a case where the primary and secondary buckets of a partitioned 
> region can become out of sync if a client times out while waiting for a slow 
> operation to finish. Here's the scenario:
> 1. A operation is started by the client and gets stuck on the server, for 
> example by a slow cache writer. That operation is assigned an EventID  with a 
> sequence number of 1.
> 2. The client times out.
> 3. The client performs a second operation. That operation gets assigned an 
> EventID with a sequence number of 2.
> 4. The second operation is applied on all members. The EventTracker records 
> the sequence number 2.
> 5. The original operation continues. It is applied to the primary (because it 
> has passed the EventTracker test).
> 6. The original operation is rejected by the EventTracker on the secondary. 
> The two copies of the bucket are now inconsistent.
> One possible fix is to change the thread id of the thread on the client when 
> the client operation times out. That would ensure that the EventTracker will 
> not reject the original operation when it finally goes through, because it 
> has a different thread id.
> If an operation is delayed on the server, for example by a very slow cache 
> writer, the operation can time out on the client.
> The client can then go on and perform a second operation.
> The problem is that each operation is assigned an event id which is a 
> combination of the clients thread id and a sequence number. That second 
> operation has a higher sequence number.
> Once the second operation is applied to a region on a given member, the event 
> is stored in the EventTracker and that member will reject any lower sequence 
> numbers



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (GEODE-697) A client thread timing out an operation and performing further operations can result in cache inconsistency

Reply via email to