Re: [DB-SIG] Two-phase commit API proposal (was Re: Any standard for two phase commit APIs?)

James Henstridge Wed, 23 Jan 2008 17:44:34 -0800

On 23/01/2008, M.-A. Lemburg <[EMAIL PROTECTED]> wrote:
> On 2008-01-23 02:18, James Henstridge wrote:
> >> [XID format used in XA]
> >> So, essentially, only the global transaction id and the branch id
> >> are relevant and both are represented in the data string.
> >
> > One interesting part of that is the "If OSI CCR naming is used, then
> > the XID's formatID element should be set to 0; if some other format is
> > used, then the formatID element should be greater than 0."
> >
> > I took a quick look at a few J2EE servers (which use XA), to see what
> > they do for transaction managers.  Neither JBoss or Geronimo seem to
> > use formatID=0, but instead use magic numbers that I presume are
> > intended to determine if they created the transaction ID.
> >
> > That said, the selection of format identifiers seems a bit ad-hoc:
> > Geronimo uses 0x4765526f, which has a byte representation of "GeRo".
> >
> > It seems that you could do pretty much the same thing by getting TMs
> > to check the global ID itself ...
>
> So we do need to store the "formatID" as well ?
>
> >> BTW, there's a nice extension module that let's you hook Python
> >> between the TM and RM using XA:
> >>
> >>     http://www.hare.demon.co.uk/pyxasw/
> >
> >
> >
> >>> I do see a use for the branch qualifier though.  In a distributed
> >>> transaction, each resource should have a different transaction ID that
> >>> share a common global transaction ID but separate branch qualifiers.
> >>>
> >>> As transaction IDs are global within database clusters for some
> >>> backends (PostgreSQL, MySQL and probably others), the branch qualifier
> >>> is necessary if two databases from the cluster are used in the global
> >>> transaction.
> >>>
> >>> I think it is worth making the API such that it is easy to program to
> >>> best practices.
> >> The DB-API has always tried to not get in the way of how
> >> a particular backends needs its configuration data, so
> >> I think we can still have a single string using a database
> >> backend specific format. This could then include one or more
> >> of the above id parts.
> >>
> >> The implementation can then decode the string representation
> >> of the transaction id components into whatever format is
> >> needed by the backend.
> >
> > The two reasons I see for using an object to represent transactions
> > that contains a global part and branch part are:
> >
> > 1. round tripping a transaction ID from xa_recover() to
> > xa_commit()/xa_rollback().
> > 2. Reduced restrictions on the contents of the transaction ID.
> >
> > For (1), using a database adapter defined object means that it can
> > represent transactions that originated elsewhere, or expose more
> > information about those transactions.
> >
> > For (2), if a database is using specially formatted transaction IDs at
> > the Python level that get decoded into the various components, does
> > that mean that the application or transaction manager glue needs to
> > know how to format the IDs.
> >
> > In contrast, it is pretty easy for e.g. a Postgres adapter to
> > serialise/deserialise a multi-part ID (and this is what the JDBC
> > driver does).
>
> I have no objections against using an object for this anymore,
> but let's please use an already existing object such as a
> tuple instead of having each database module implement its own
> new type.
>
> Given that the formatID is used for some purpose as well (probably
> just as identification of the TM itself), I guess we'd have
> to use a 3-tuple (format id, global transaction id, branch id).
>
> Modules should only expect to find an object that behaves like
> a 3-sequence, they should accept whatever object is passed to
> them and return it for the recover method.
>
> This leaves the door open for extensions used by the TM for XID
> objects.


I've had a bit more time to think about this, and have two proposals
on how to handle transaction IDs.  I think they offer equivalent
functionality, so the choice comes down to what we want the API to
look like.

Proposal 1:
* Plain string IDs should work fine as transaction identifiers for
  applications built from scratch with that assumption: they would
  need to identify the global and branch parts in their own way.

* A plain string can be stuffed inside an XA style transaction
  identifier, even if it isn't making use of all the different
  components.

* Therefore, all methods accepting transaction IDs should accept
  strings.

* As some transaction IDs in the database might not match this simple
  form, there are two options for the recover() method:
    1. return a special object that represents the transaction, which
       will be accepted by commit()/rollback().  How string-like must
       these objects be?
    2. omit such transaction IDs from the result.

* For databases that support more structured transaction IDs (such as
  those used by XA), the 2PC methods may accept objects other than
  strings.


Proposal 2:

* Many databases follow the XA specification, so it makes sense to use
  transaction identifiers structured in the same way.

* For databases that do not use XA-style transaction IDs, it is
  usually possible to serialise such an ID into a form that it can
  work with.

* Therefore, all methods accepting transaction IDs should accept
  3-sequences of the form (formatID, gtrid, bqual).

* For databases using non-XA transaction IDs, it is possible that some
  transaction IDs might exist that do not match the serialised form.
  The recover() method has two options:
    1. return a special object representing the ID that will be
       accepted by commit()/rollback().  Such an object should act
       like a 3-sequence.
    2. omit such transaction IDs from the result.

* For databases not using XA-style transactions, the 2PC methods may
  accept objects other than 3-sequences as transaction IDs.


Both of these proposals seem to get rid of the main points of contention:
* removes the xid() constructor from the spec.
* allow use of simple objects (strings or tuples) as transaction IDs
* provides an obvious way to expose database-specific transaction IDs.

James.
_______________________________________________
DB-SIG maillist  -  DB-SIG@python.org
http://mail.python.org/mailman/listinfo/db-sig

Re: [DB-SIG] Two-phase commit API proposal (was Re: Any standard for two phase commit APIs?)

Reply via email to