On 01/06/2011 05:43 PM, Mircea Markus wrote: > > On 6 Jan 2011, at 14:45, Jonathan Halliday wrote: > >> On 01/06/2011 02:29 PM, Mircea Markus wrote: >> >>> At the moment the *only way to transactionally* access a >>> node is by collocating the client and the server in the same >>> VM. >> >> So the scope of the transaction is limited to data >> residing in that local node? What if I want a single >> transaction to span the local node and data in a remote node? > that's possible. It's just that you have to always interact > with the local node that will acquire remote locks remotely > on behalf of your transaction.
ok, so the cluster intelligence is in the local node rather than the client, not that there is any significant distinction for now as they are co-located. > a) node goes down before TM issued prepare > - when TM resurrects and calls XAResource.recover it > receives the given XID, realises that there's an heuristic > decision (because it didn't call prepare) and take some > action (rollbacks other participants, notify sys admin?). That's not a heuristic decision. A RM is perfectly entitled to throw away any tx state up until prepare. Under the presumed abort doctrine it simply throws an error from prepare and the tx aborts cleanly. Recovery is not involved - it applies only to tx that have reached prepare. > b) node goes down after TM issues prepare > - when TM issues a commit it receives an XAException > (perhaps XA_HEURRB) and again it is aware of the heuristic > outcome Returning cleanly from a prepare is a promise by the RM to successfully apply any subsequent commit. You're not in a position to make such a promise unless your state is fault tolerant, as a node crash would otherwise leave you with inconsistent state. It's not as simple as saying you'd rollback - what if you prepare, get told to commit, apply remote changes (step 4.2.1), then crash before applying local changes (4.2.2)? You can't report that as a rollback - you applied some of the updates. You have to include the tx in the recovery list as heuristic hazard (unless NodeA will transparently repopulate with the committed data, in which case you can mask the failure or report heuristic commit), but how to even detect the heuristic at recovery time? NodeA has no persistent record of the tx and NodeB thinks it completed cleanly and has cleaned up its tx record to avoid leaking. Where is the data that tells you you've got a problem? Or have a more sophisticated scenario where there is an additional NodeC, thus requiring multiple 'apply remote changes' calls. Are those atomic across the cluster? If there is a possibility that NodeB will apply the update but NodeC won't, or NodeA will crash after issuing a call to B but before C, you can wind up with inconsistent state in the surviving B and C. Alternatively, what if A survives but C crashes whilst applying changes that B has already sucessfully applied? That's not necessarily a recovery situation as far as the TM is concerned, but it may be from your perspective as you'll need to detect and (ideally) fix or (as a last resort) report the inconsistent data. A lot of your behaviour is going to depend on what it means for a node to recover after a crash. If it simply comes up empty and expects to be repopulated from an external source, as with a normal cache, then your relation to the XAResource of that external source is critical. On the other hand if your cluster node is itself fault tolerant through replication, then you need to think carefully about how the RM functionality ties into that replication - basically the tx state information is not local to the node where the XAResource resides, but must be replicated in the same manner as the other data in that node and that replication must be synchronous at certain state transitions in the tx lifecycle - it's logging recovery information through RPC rather than disk write. Really interesting things are going to happen if a single transaction spans data that is a mix of cache copy of data stored persistently in an XA database and data for which infinispan is the definitive, fault tolerant repository. To make a cluster appear to the outside world as a single logical entity for transaction purposes, you're pretty much going to wind up doing interposition. That means you're implementing not only an RM but substantial chunks of what amounts to a TM too. Have fun. Jonathan. -- ------------------------------------------------------------ Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in UK and Wales under Company Registration No. 3798903 Directors: Michael Cunningham (USA), Charlie Peters (USA), Matt Parsons (USA) and Brendan Lane (Ireland) _______________________________________________ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev