[
https://issues.apache.org/jira/browse/GEODE-6607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16823513#comment-16823513
]
ASF subversion and git services commented on GEODE-6607:
--------------------------------------------------------
Commit afc311c04f6908a8f725834cdf2c49ce6e902b3f in geode's branch
refs/heads/develop from Ryan McMahon
[ https://gitbox.apache.org/repos/asf?p=geode.git;h=afc311c ]
GEODE-6607: Moving client registration queue to CacheClientNotifier
To avoid client subscription data inconsistencies, we need to ensure
that we minimize the chance that an event is processed while a client is
registering but before it has fully registered. There are two major
phases in registration - one is to request filter info from a peer
already hosting the queue for the client, and the other is doing a GII
of the queue from a peer. If an event which a client would be
interested in is processed concurrently during registration, but before
the filter info has been fully received and processed, the event will be
missed by the client. To reduce this window, we will start queueing
events for the registering client as soon as possible (deserialization
of the client proxy membership ID). After registration is complete, we
drain the queued events and put them into the clients subscription
queue.
To make this code unit testable, it was necessary to extract the logic
reading data off the socket/deserializing that data into a separate
class which can be injected, the ClientRegistrationMetadata class. This
allows us to mock a ClientRegistrationMetadata without actually doing any IO.
The CacheClientNotifier could be futher broken up to allow for even more
unit testability, but this was a first step in the right direction.
Co-authored-by: Ryan McMahon <[email protected]>
Co-authored-by: Murtuza Boxwala <[email protected]>
Co-authored-by: Ernie Burghardt <[email protected]>
> Possible client subscription data inconsistency due to race between
> retrieving filter info and distributing event
> -----------------------------------------------------------------------------------------------------------------
>
> Key: GEODE-6607
> URL: https://issues.apache.org/jira/browse/GEODE-6607
> Project: Geode
> Issue Type: Bug
> Components: client queues
> Reporter: Ryan McMahon
> Assignee: Ryan McMahon
> Priority: Major
> Time Spent: 2h 40m
> Remaining Estimate: 0h
>
> It is possible for a client to miss events from subscription (either CQ or
> register interest) due to the following scenario:
> Four servers in a cluster, with redundant copies set to 2 for client
> subscriptions. The client has its primary subscription endpoint with server
> 1 and redundant copies are on servers 2 and 3. Server 2 is killed or lost
> due to network partition, so we attempt to restore redundancy by copying the
> client queue from server 3 to server 4.
> Two things happen when server 4 gets the client queue from server 3. First,
> we request the client's filter info which represents the CQ and register
> interest info. Second, we actually perform the GII to get the image of the
> queue.
> A race can occur where an event is being distributed across the cluster
> concurrently while server 4 is initializing the client queue. If the
> distributed event is processed by server 4 before the filter info is
> retrieved, then the event will not match the client subscription filter
> because it doesn't exist yet. Then, if the event is processed by server 3
> after GII has started, the event will not be part of the client queue image.
> Therefore, the event is never added to the client queue and is lost.
> We have a special queue for handling events while a client is initializing,
> but it is at too low of a level (MessageDispatcher) to be able to handle this
> scenario. One possible solution is moving this special queue to a higher
> level (CacheClientNotifier or CacheClientProxy) so the event is queued before
> we even attempt to get filter info. Then, when initialization finishes, we
> drain the queue, see if it matches the initialized client's filter, and send
> it along if so. A similar solution could be done on the GII provider side
> but it might be a bit messier.
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)