[ 
https://issues.apache.org/jira/browse/GEODE-8644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17460939#comment-17460939
 ] 

Xiaojian Zhou commented on GEODE-8644:
--------------------------------------

The root cause is: 

When CME happened, notifyTimestampsToGateways() will be called in 
AbstractRegionMap. The gateway event with UPDATE_VERSION operation will be 
enqueued. 

At the server as secondary queue holder, this event is ignored, not to call 
handleSecondaryEvent(). But at the primary queue holder, this event will still 
be queued and add a unprocessedToken. Since there's no corresponding event will 
arrive at secondary queue to trigger removal of the token, when this scenario 
happen, the tokens will always be leaked. 

It's a very old code and behavior, as old as in 8.2. We did not find this 
problem earlier is due to 2 reasons: 1) It's a rarely happened race. 2) We did 
not have a test to purposely test unprocessedToken draining until GEODE-7643 
introduced one. 

There're several ways to fix it:
One alternative is not to enqueue this kind of event into primary queue, like 
what we did in secondary queue. But this alternative changed current logic and 
assumption and it's risky. 

So I choose only not to add into unprocessedTokens for this kind of event. This 
fix is very conservative. 


> SerialGatewaySenderQueueDUnitTest.unprocessedTokensMapShouldDrainCompletely() 
> intermittently fails when queues drain too slowly
> -------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: GEODE-8644
>                 URL: https://issues.apache.org/jira/browse/GEODE-8644
>             Project: Geode
>          Issue Type: Bug
>    Affects Versions: 1.15.0
>            Reporter: Benjamin P Ross
>            Assignee: Xiaojian Zhou
>            Priority: Major
>              Labels: GeodeOperationAPI, needsTriage, pull-request-available
>
> Currently the test 
> SerialGatewaySenderQueueDUnitTest.unprocessedTokensMapShouldDrainCompletely() 
> relies on a 2 second delay to allow for queues to finish draining after 
> finishing the put operation. If queues take longer than 2 seconds to drain 
> the test will fail. We should change the test to wait for the queues to be 
> empty with a long timeout in case the queues never fully drain.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to