On 6 Jan 2011, at 10:45, Mircea Markus wrote:

> 
> On 5 Jan 2011, at 17:19, Jonathan Halliday wrote:
> 
>> On 01/05/2011 03:42 PM, Mircea Markus wrote:
>>> 
>>> On 5 Jan 2011, at 14:51, Mircea Markus wrote:
>>> 
>>>> FYI, a discussion I have with Jonathan around recovery support from TM
>>>> 
>>>> On 5 Jan 2011, at 14:43, Jonathan Halliday wrote:
>>>>> On 01/05/2011 02:18 PM, Mircea Markus wrote:
>>>>> 
>>>>>> I don't know how the TM recovery process picks up the XAResource 
>>>>>> instance on which to call XAResource.recover, but I imagine it expects 
>>>>>> this method to return all the prepared(or heuristic completed) 
>>>>>> transactions from the _whole transaction branch_, i.e. from the entire 
>>>>>> cluster.
>>>>> 
>>>>> all from the logical RM, which you happen to implement as a cluster, yes.
>>>>> 
>>>>>> I'm asking this  because right now there's no way for a node to know all 
>>>>>> the prepared transaction in the entire cluster. This is doable but would 
>>>>>> involve an broadcast to query the cluster, which might be costly (time 
>>>>>> and bandwidth).
>>>>> 
>>>>> right. not to mention it should, strictly speaking, block or fail if any 
>>>>> node is unreachable, which kinda sucks from an availability perspective.
>>> So if a node does not respond to the broadcast, it is incorrect to return 
>>> the prepared transactions received from the other nodes? (is this because 
>>> the TM expects to receive some tx that it knows for sure to be prepared?) 
>>> Or would a "best effort" be "good enough"? (e.g. I broadcast the query and 
>>> return all the results received in 1 sec)
>> 
>> hmm, interesting question.
>> 
>> Keep in mind that the XA spec dates from a time when a typical large 
>> clustered RM was 2-3 oracle nodes on the same LAN segment. It simply isn't 
>> geared to a world where the number of nodes is so large and widely 
>> distributed that the probability of *all* of them being available 
>> simultaneously is pretty slim. Likewise the number of transaction managers 
>> connected to a resource was assumed to be small, often 1, rather than the 
>> large N we see on modern clusters / clouds. As a result, the spec either 
>> fails to give guidance on some issues because they weren't significant at 
>> the time it was written, or implies/mandates behaviour that is counter 
>> productive in modern environments.
>> 
>> Thus IMO some compromises are necessary to make XA usable in the real world, 
>> especially at scale. To further complicate matters, these are split across 
>> RM and TM, with different vendors having different views on the subject. My 
>> advice is geared to the way JBossTS drives XA recovery - other TMs may 
>> behave differently and make greater or lesser assumptions about compliance 
>> with the letter of the spec. As a result you may find that making your RM 
>> work with multiple vendor's TMs requires a) configuration options and b) a 
>> lot of painful testing.  Likewise JBossTS contains code paths and config 
>> options geared to dealing with bugs or non-compliant behaviour in various 
>> vendor's RMs.
>> 
>> Now, on to the specific question: The list returned should, strictly 
>> speaking, be complete. There are two problems with that. First, you have to 
>> be able to reach all your cluster nodes to build a complete list which, as 
>> previously mentioned, is pretty unlikely in a sufficiently large cluster. 
>> Your practical strategies are thus as you say: either a) throw an 
>> XAException(XAER_RMFAIL) if any node is unreachable within a reasonable 
>> timeout and accept that this may mean an unnecessary delay in recovering the 
>> subset of tx that are known or b) return a partial list on a best effort 
>> basis. The latter approach allows the transaction manager to deal with at 
>> least some of the in-doubt tx, which may in turn mean releasing 
>> resources/locks in the RM. In general I'd favour that option as having 
>> higher practical value in terms of allowing the best possible level of 
>> service to be maintained in the face of ongoing failures.

+1

In fact on some mainframe implementations of CICS, for example, it behaves 
exactly like this.

>> 
>> JBossTS will rescan every N minutes (2 by default) and thus you can simply 
>> include any newly discovered in-doubt tx as they become known due to e.g. 
>> partitioned nodes rejoining the cluster, and the TM will deal with them when 
>> they are first seen. Note however that some TMs assume that if they scan an 
>> RM and that RM does not subsequently crash, no new in-doubt transactions 
>> will occur except from heuristics. Let's gloss over how they can even detect 
>> a crash/recover of the RM if the driver masks it with failover or the event 
>> happens during a period when the TM makes no call on the driver. Such a TM 
>> will perform a recovery scan once at TM startup and not repeat. In such case 
>> you may have in-doubt tx from nodes unavailable at that crucial time 
>> subsequently sitting around for a prolonged period, tying up precious 
>> resources and potentially blocking subsequent updates. Most RM vendors 
>> provide some kind of management capability for admins to view and manually 
>> force completion of i!
 n!
> -doubt tx. command line tool, jmx, web gui, whatever, just so long as it 
> exists.
> When a node crashes all the transactions that node owns (i.e. tx which were 
> originated on that node and XAResource instance residing on that node) 
> automatically rollback, so that no resources (locks mainly) are held. The 
> only thing we need to make sure though is that the given transaction ids (the 
> one that heuristically rollback) are returned by theXAResource.recover method 
> - doable in the same way we handle prepares. I imagine that we'll have to 
> keep these XIDs until XAResource.forget(XID) is called, am I right? Is it 
> common/possible for people to use TM _without_ recovery? If so,  this "held 
> heuristic completed TX" functionality should be configurable 
> (enabled/disabled) in order to avoid memory leaks (no recovery means .forget 
> never gets called)     

If you're using a transaction manager then use it all. Don't futz about and 
just use this bit or that bit and still say you're using transactions ;-)

Mark.

---
Mark Little
mlit...@redhat.com

JBoss, by Red Hat
Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, 
Windsor, Berkshire, SI4 1TE, United Kingdom.
Registered in UK and Wales under Company Registration No. 3798903 Directors: 
Michael Cunningham (USA), Charlie Peters (USA), Matt Parsons (USA) and Brendan 
Lane (Ireland).





_______________________________________________
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Reply via email to