[
https://issues.apache.org/jira/browse/GEODE-697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15200355#comment-15200355
]
Hitesh Khamesra commented on GEODE-697:
---------------------------------------
>>If we ignore the event ID on the secondary we will have more inconsistencies.
>>If a client did put(x,a) and failed over to another server during the
>>operation but succeeded in finishing it and then did a put(x,b) it's entirely
>>possible for the put(x,a) to still be in transit to the secondary from the
>>original attempt.
For same key-x this should not be problem as we take lock entry-x on primary.
But for different key it will work no??
>>Ignoring the event ID on the server cache won't stop it from being rejected
>>by client queues, either.
This can be problem and client queues may miss the event but in my opinion
atleast cache will remain in consistent state..
> A client thread timing out an operation and performing further operations can
> result in cache inconsistency
> -----------------------------------------------------------------------------------------------------------
>
> Key: GEODE-697
> URL: https://issues.apache.org/jira/browse/GEODE-697
> Project: Geode
> Issue Type: Bug
> Reporter: Dan Smith
> Assignee: Bruce Schuchardt
>
> There is a case where the primary and secondary buckets of a partitioned
> region can become out of sync if a client times out while waiting for a slow
> operation to finish. Here's the scenario:
> 1. A operation is started by the client and gets stuck on the server, for
> example by a slow cache writer. That operation is assigned an EventID with a
> sequence number of 1.
> 2. The client times out.
> 3. The client performs a second operation. That operation gets assigned an
> EventID with a sequence number of 2.
> 4. The second operation is applied on all members. The EventTracker records
> the sequence number 2.
> 5. The original operation continues. It is applied to the primary (because it
> has passed the EventTracker test).
> 6. The original operation is rejected by the EventTracker on the secondary.
> The two copies of the bucket are now inconsistent.
> One possible fix is to change the thread id of the thread on the client when
> the client operation times out. That would ensure that the EventTracker will
> not reject the original operation when it finally goes through, because it
> has a different thread id.
> If an operation is delayed on the server, for example by a very slow cache
> writer, the operation can time out on the client.
> The client can then go on and perform a second operation.
> The problem is that each operation is assigned an event id which is a
> combination of the clients thread id and a sequence number. That second
> operation has a higher sequence number.
> Once the second operation is applied to a region on a given member, the event
> is stored in the EventTracker and that member will reject any lower sequence
> numbers
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)