On 6 Jan 2011, at 10:45, Mircea Markus wrote: > > On 5 Jan 2011, at 17:19, Jonathan Halliday wrote: > >> On 01/05/2011 03:42 PM, Mircea Markus wrote: >>> >>> On 5 Jan 2011, at 14:51, Mircea Markus wrote: >>> >>>> FYI, a discussion I have with Jonathan around recovery support from TM >>>> >>>> On 5 Jan 2011, at 14:43, Jonathan Halliday wrote: >>>>> On 01/05/2011 02:18 PM, Mircea Markus wrote: >>>>> >>>>>> I don't know how the TM recovery process picks up the XAResource >>>>>> instance on which to call XAResource.recover, but I imagine it expects >>>>>> this method to return all the prepared(or heuristic completed) >>>>>> transactions from the _whole transaction branch_, i.e. from the entire >>>>>> cluster. >>>>> >>>>> all from the logical RM, which you happen to implement as a cluster, yes. >>>>> >>>>>> I'm asking this because right now there's no way for a node to know all >>>>>> the prepared transaction in the entire cluster. This is doable but would >>>>>> involve an broadcast to query the cluster, which might be costly (time >>>>>> and bandwidth). >>>>> >>>>> right. not to mention it should, strictly speaking, block or fail if any >>>>> node is unreachable, which kinda sucks from an availability perspective. >>> So if a node does not respond to the broadcast, it is incorrect to return >>> the prepared transactions received from the other nodes? (is this because >>> the TM expects to receive some tx that it knows for sure to be prepared?) >>> Or would a "best effort" be "good enough"? (e.g. I broadcast the query and >>> return all the results received in 1 sec) >> >> hmm, interesting question. >> >> Keep in mind that the XA spec dates from a time when a typical large >> clustered RM was 2-3 oracle nodes on the same LAN segment. It simply isn't >> geared to a world where the number of nodes is so large and widely >> distributed that the probability of *all* of them being available >> simultaneously is pretty slim. Likewise the number of transaction managers >> connected to a resource was assumed to be small, often 1, rather than the >> large N we see on modern clusters / clouds. As a result, the spec either >> fails to give guidance on some issues because they weren't significant at >> the time it was written, or implies/mandates behaviour that is counter >> productive in modern environments. >> >> Thus IMO some compromises are necessary to make XA usable in the real world, >> especially at scale. To further complicate matters, these are split across >> RM and TM, with different vendors having different views on the subject. My >> advice is geared to the way JBossTS drives XA recovery - other TMs may >> behave differently and make greater or lesser assumptions about compliance >> with the letter of the spec. As a result you may find that making your RM >> work with multiple vendor's TMs requires a) configuration options and b) a >> lot of painful testing. Likewise JBossTS contains code paths and config >> options geared to dealing with bugs or non-compliant behaviour in various >> vendor's RMs. >> >> Now, on to the specific question: The list returned should, strictly >> speaking, be complete. There are two problems with that. First, you have to >> be able to reach all your cluster nodes to build a complete list which, as >> previously mentioned, is pretty unlikely in a sufficiently large cluster. >> Your practical strategies are thus as you say: either a) throw an >> XAException(XAER_RMFAIL) if any node is unreachable within a reasonable >> timeout and accept that this may mean an unnecessary delay in recovering the >> subset of tx that are known or b) return a partial list on a best effort >> basis. The latter approach allows the transaction manager to deal with at >> least some of the in-doubt tx, which may in turn mean releasing >> resources/locks in the RM. In general I'd favour that option as having >> higher practical value in terms of allowing the best possible level of >> service to be maintained in the face of ongoing failures.
+1 In fact on some mainframe implementations of CICS, for example, it behaves exactly like this. >> >> JBossTS will rescan every N minutes (2 by default) and thus you can simply >> include any newly discovered in-doubt tx as they become known due to e.g. >> partitioned nodes rejoining the cluster, and the TM will deal with them when >> they are first seen. Note however that some TMs assume that if they scan an >> RM and that RM does not subsequently crash, no new in-doubt transactions >> will occur except from heuristics. Let's gloss over how they can even detect >> a crash/recover of the RM if the driver masks it with failover or the event >> happens during a period when the TM makes no call on the driver. Such a TM >> will perform a recovery scan once at TM startup and not repeat. In such case >> you may have in-doubt tx from nodes unavailable at that crucial time >> subsequently sitting around for a prolonged period, tying up precious >> resources and potentially blocking subsequent updates. Most RM vendors >> provide some kind of management capability for admins to view and manually >> force completion of i! n! > -doubt tx. command line tool, jmx, web gui, whatever, just so long as it > exists. > When a node crashes all the transactions that node owns (i.e. tx which were > originated on that node and XAResource instance residing on that node) > automatically rollback, so that no resources (locks mainly) are held. The > only thing we need to make sure though is that the given transaction ids (the > one that heuristically rollback) are returned by theXAResource.recover method > - doable in the same way we handle prepares. I imagine that we'll have to > keep these XIDs until XAResource.forget(XID) is called, am I right? Is it > common/possible for people to use TM _without_ recovery? If so, this "held > heuristic completed TX" functionality should be configurable > (enabled/disabled) in order to avoid memory leaks (no recovery means .forget > never gets called) If you're using a transaction manager then use it all. Don't futz about and just use this bit or that bit and still say you're using transactions ;-) Mark. --- Mark Little mlit...@redhat.com JBoss, by Red Hat Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in UK and Wales under Company Registration No. 3798903 Directors: Michael Cunningham (USA), Charlie Peters (USA), Matt Parsons (USA) and Brendan Lane (Ireland). _______________________________________________ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev