[
https://issues.apache.org/jira/browse/GEODE-6607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ryan McMahon resolved GEODE-6607.
---------------------------------
Resolution: Fixed
Fix Version/s: 1.10.0
> Possible client subscription data inconsistency due to race between
> retrieving filter info and distributing event
> -----------------------------------------------------------------------------------------------------------------
>
> Key: GEODE-6607
> URL: https://issues.apache.org/jira/browse/GEODE-6607
> Project: Geode
> Issue Type: Bug
> Components: client queues
> Reporter: Ryan McMahon
> Assignee: Ryan McMahon
> Priority: Major
> Fix For: 1.10.0
>
> Time Spent: 2h 40m
> Remaining Estimate: 0h
>
> It is possible for a client to miss events from subscription (either CQ or
> register interest) due to the following scenario:
> Four servers in a cluster, with redundant copies set to 2 for client
> subscriptions. The client has its primary subscription endpoint with server
> 1 and redundant copies are on servers 2 and 3. Server 2 is killed or lost
> due to network partition, so we attempt to restore redundancy by copying the
> client queue from server 3 to server 4.
> Two things happen when server 4 gets the client queue from server 3. First,
> we request the client's filter info which represents the CQ and register
> interest info. Second, we actually perform the GII to get the image of the
> queue.
> A race can occur where an event is being distributed across the cluster
> concurrently while server 4 is initializing the client queue. If the
> distributed event is processed by server 4 before the filter info is
> retrieved, then the event will not match the client subscription filter
> because it doesn't exist yet. Then, if the event is processed by server 3
> after GII has started, the event will not be part of the client queue image.
> Therefore, the event is never added to the client queue and is lost.
> We have a special queue for handling events while a client is initializing,
> but it is at too low of a level (MessageDispatcher) to be able to handle this
> scenario. One possible solution is moving this special queue to a higher
> level (CacheClientNotifier or CacheClientProxy) so the event is queued before
> we even attempt to get filter info. Then, when initialization finishes, we
> drain the queue, see if it matches the initialized client's filter, and send
> it along if so. A similar solution could be done on the GII provider side
> but it might be a bit messier.
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)