Darrel Schneider commented on GEODE-1885:

This fix caused a deadlock. If an offheap region is being destroyed while 
concurrent modifications are being done and if a clear is done on it then the 
deadlock can happen.

The deadlock is caused by the code setting the offheap region entry value to a 
REMOVE token but not throwing an exception. This causes the higher level code 
to leave the entry in the map (if we had thrown an exception the higher level 
code would have removed the entry from the map). Then another thread that has 
the RVV read lock keeps seeing this entry with the REMOVE token and spinning 
around and seeing it again. Holding the RVV read lock blocks clear who is 
trying to get the RVV write lock. The clear blocks region destroy from 
completing because it waits for an in progress clear.

> Missing subsctiption event with Offheap partitioned region during bucket 
> rebalance.
> -----------------------------------------------------------------------------------
>                 Key: GEODE-1885
>                 URL: https://issues.apache.org/jira/browse/GEODE-1885
>             Project: Geode
>          Issue Type: Bug
>          Components: offheap
>            Reporter: Anilkumar Gingade
>            Assignee: Darrel Schneider
>             Fix For: 1.0.0-incubating
> During transaction operation, if there is concurrent redundant bucket 
> re-balance is in progress, the client can miss a subscription event, if its 
> primary queue is hosted on the node where bucket gets moved from.
> Consider, three node cluster N1, N2 and N3. With:
> - Client C1 connected to node N2.
> - Primary bucket region B1 on N1. And secondary bucket for B1 on N2.
> - A Transaction is started on N2, which creates a entry on B1.
> - When the TX is committed. At the same time the Bucket B1 on N2 is moved to 
> N3.
> - The Tx commit message from N1 is sent to N2. This also includes the 
> subscription message, satisfying the client C1.
> - On N2, for offheap region, when bucket is not found locally, the exception 
> response is sent to back to N1 without processing the subscription message.

This message was sent by Atlassian JIRA

Reply via email to