Kenneth Howe created GEODE-5513:
-----------------------------------
Summary: Clients may miss PR region events due to race during
registerInterest
Key: GEODE-5513
URL: https://issues.apache.org/jira/browse/GEODE-5513
Project: Geode
Issue Type: Bug
Components: client queues
Reporter: Kenneth Howe
Here is the scenario:
Consider two servers and client:
- Server1 hosting the primary bucket
- Server2 hosting secondary bucket and also primary queue for the Client2
- Client1 Doing remove operation
- Client2 doing register interest
- The Client1 starts remove-all operation
- At the same time Client2 is registering interest
- Server1 receives the remove-all operation processes it, and sends the
adjunct message to the Server2 (Its still not yet received the interest info
from server1)
- While the remove-all to server2 in flight
- Server2 sends interest profile info to Server1 for client2; and then Server2
(as it is hosting the primary queue) starts building the initial image snapshot
for the interest. When building initial image for PR preference is given to
collect data from local node. During this time the removal message is still in
flight and hasn't applied on Server2. The initial image for interest
registration calculates the snapshot from local data, and sends it to client,
missing the remove-all op.
This could happen with non-bulk ops; but it gets worse with bulk ops as the
time taken to replicate the bulk ops will take more time.
The solution is to build the initial register interest response by getting the
data from primary bucket. This will add little overhead in building the
interest response; but considering that most or always the register response
will involve remote node, this may be negligible.
Clients registering interest in a region
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)