If you do a put with a local option, it won't replicate to anyone, so the node 
that did the replication will be out of sync with the buddies.

As to multiple nodes simultaneously doing a put on the same node, here's what 
happens. I'm assuming the node already exists.  

Assume no tx running.  The data in question is stored on server0 and it's buddy 
group.

1) You do a put() on server 1. Simultaneously a put() on server0.
2) DataGravitatorInterceptor.1 and DataGravitatorInterceptor.2 both see the 
node doesn't exist; fetches the node's data from across the cluster.
3) DataGravitatorInterceptor.1 and .2 take the data and do a put (not local).  
This replicates the data to its buddies. No tx, so no lock is held on the node. 
At this point there are three copies of the data -- the server0 group's, the 
server1 group's and the server2 group's.
4) DataGravitatorInterceptor.1 and .2 send a cleanup call to the cluster.  Any 
copy of the data not associated with the sending server's buddy group is 
removed.
5) The original puts go through.

The end result here will very much depend on how things get interleaved. With 
REPL_SYNC you could end up with a TimeoutException in Step 4 as server1 and 
server2 tell each other to remove the data and deadlock. Or server1 completes 
steps 3-5 and then server 2 executes steps 3-5, in which case server 2's change 
wins. Or both complete step 3, then server 1 completes step 4 (so the server 0 
and server 2 copies are gone), then server 2 completes step 4 (so the server 1 
copy is gone). Then the both complete step 5, resulting in 2 sets of data, each 
of which only has the key/value pair included in the put.

Now, if there is a tx in place:

The put() in step 3 is done in a tx, so a write lock will be held on the node 
on each server until the tx commits.  The put will not replicate until the tx 
commits.

The removes in step 4 will also not be broadcast until the tx commits.

The put in step 5 will not be replicated until the tx commits.

The fact that the WL from step 3 is held should make steps 3-5 atomic.  If it's 
REPL_SYNC, you have two servers trying to write to the same node, so it's 
possible when the tx tries to commit you'll get a TimeoutExceptio due to a lock 
conflict.  With REPL_ASYNC, the later tx will win; the step 5 put from the 
earlier tx will be lost.

But.. while writing this I'm pretty sure I've spotted a bug in the tx case.  
The step 4 cleanup call gets bundled together with the other tx changes and 
therefore only gets replicated to the server's buddy's, not to the whole 
cluster.





View the original post : 
http://www.jboss.com/index.html?module=bb&op=viewtopic&p=3994763#3994763

Reply to the post : 
http://www.jboss.com/index.html?module=bb&op=posting&mode=reply&p=3994763
_______________________________________________
jboss-user mailing list
[email protected]
https://lists.jboss.org/mailman/listinfo/jboss-user

Reply via email to