[DB-SIG] Two-phase commit API proposal (was Re: Any standard for two phase commit APIs?)

James Henstridge Mon, 21 Jan 2008 02:09:43 -0800

On 18/01/2008, James Henstridge <[EMAIL PROTECTED]> wrote:
> So is there any recommendations for what a two-phase commit API should
> look like?


I did a bit of investigation into a few databases, and came up with a
proposal for an extension to the DB-API.

I know that there are a few incomplete portions of the proposal, so
I'd appreciate feedback.  If you have knowledge of a database not
covered here, please comment on whether the proposed API would be
workable in that context.

Re: the confusion between "prepared transactions" vs. "prepared
statements" support, this probably won't conflict since the prepared
statement extensions I saw used Cursor.prepare() rather than
Connection.prepare().  If it is a problem though, the proposal could
be modified to use prepare_transaction() or similar.

James.

= DB-API Two-Phase Commit =

Many databases have support for two-phase commit.  Adapters for some
of these databases expose this support, but often through mutually
incompatible extensions to the DB-API standard.

Standardising the API for two-phase commit would make it easier for
applications and libraries to support two-phase commit with multiple
databases.

== Operations ==

The flow of a two-phase commit transaction looks something like this:

 1. begin the transaction
 2. do some work
 3. prepare the transaction for commit
 4. commit or rollback the prepared transaction (depending on
 successful preparation of the other transactions).

For some databases, a transaction identifier must be provided.  Some
databases expect the identifier to be structured in two parts: a
global transaction identifier and a branch qualifier.

As it is possible for a failure to occur between preparing a
transaction and committing it, there is often a way to discover those
transactions and issue a commit or rollback.

== Database Support ==

=== PostgreSQL ===

PostgreSQL has supported two-phase commit since 8.1.

To start a two-phase commit transaction, no special command need be
issued other than the standard "BEGIN" statement.

The transaction can be prepared with the "PREPARE TRANSACTION"
statement as described here:

  http://www.postgresql.org/docs/current/static/sql-prepare-transaction.html

A transaction identifier must be provided that is less than 200 bytes
long.

The transaction can then be committed or rolled back using the "COMMIT
PREPARED" or "ROLLBACK PREPARED" statements.

  http://www.postgresql.org/docs/current/static/sql-commit-prepared.html
  http://www.postgresql.org/docs/current/static/sql-rollback-prepared.html

Uncommitted prepared transactions can be recovered by listing the
pg_prepared_xacts table:

  http://www.postgresql.org/docs/current/static/view-pg-prepared-xacts.html

None of the PostgreSQL database adapters expose two-phase commit at
this time.


== MySQL ==

MySQL has supported two-phase commit since 5.0.3.

  http://dev.mysql.com/doc/refman/6.0/en/xa.html
  http://dev.mysql.com/doc/refman/6.0/en/xa-statements.html
  http://dev.mysql.com/doc/refman/6.0/en/xa-states.html

To start a two-phase commit transaction, the "XA START" statement must
be issued.  This statement takes a transaction identifier as an
argument, which consists of a global identifier with optional branch
qualifier and format ID.

The transaction can be prepared by issuing the "XA END" and "XA
PREPARE" statements.

Finally, the prepared transaction can be committed or rolled back with
"XA COMMIT" or "XA ROLLBACK" respectively.

Uncommitted prepared transactions can be listed using the "XA RECOVER"
statement.

The MySQLdb adapter does not appear to expose this functionality.


=== Oracle ===

I'm not familiar with Oracle, but the cx_Oracle adapter does expose
two-phase commit.  The interface is covered here:

  http://cx-oracle.sourceforge.net/html/module.html
  http://cx-oracle.sourceforge.net/html/connobj.html

To use two-phase commit, the twophase=True must be passed to the
cx_Oracle.connect() factory function.  The documentation states that
this option is not the default due to bugs in older versions of
Oracle.

A two-phase commit transaction can be started with the begin() method:

  Connection.begin(formatId, transactionId, branchId)

The transaction is prepared with the Connection.prepare() method, and
can then be committed or rolled back with the standard commit() or
rollback() methods.

==== Questions ====

 1. What happens when you call prepare() on a transaction that was
    begun implicitly.
 2. What happens if you call close() after calling prepare()?
 3. Can you recover uncommitted prepared transactions?


=== Interbase / Firebird ===

The kinterbasdb adapter exposes two-phase commit.

It does not appear to need any special setup for a two-phase commit
transaction -- simply call Connection.prepare() prior to
Connection.commit() or Connection.rollback().

There does not appear to be any API for recovering uncommitted
prepared transactions, but perhaps they don't survive longer than the
connection.


== Summary ==

When looking at these databases, there appear to be a few details
where they differ:

 1. do they require the application to provide a transaction ID?
 2. do they need to know in advance that a given transaction will use
    two-phase commit?
 3. do they provide a recovery API?

To cover difference (1), we have two choices:

 1. Require the user to provide a transaction ID.  Databases that
    don't require a transaction ID can ignore it.
 2. Don't expose transaction IDs, and require adapters that need them
    to generate them (use UUIDs maybe?)
 3. A combination of the two, making it optional to provide
    transaction IDs in the API.

Due to difference (2), it is probably necessary to have a method to
start a two-phase commit transaction.  It should take a transaction
ID (assuming that we are exposing transaction IDs).

As for recovery, it is probably best to leave this out of scope
initially.


== Proposal ==

I think it'd be worth adopting an interface similar to cx_Oracle's and
kinterbasdb's.  That is:

 1. Add a Connection.begin(...) method that explicitly starts a
    transaction.  Some argument (possibly the transaction ID) causes
    the transaction to use two-phase commit.  May raise
    NotSupportedError if two-phase commit is not supported.

    [As some adapters currently have a no-argument begin() to start a
    normal transaction, an argument will be necessary to begin in
    two-phase mode].

 2. Add a Connection.prepare() method that peforms the first stage of
    two-phase commit.  May raise NotSupportedError if two-phase commit
    is not supported, or the transaction was not started in two-phase
    mode.

 3. Calling commit() or rollback() on the connection after prepare()
    performs the second stage of the commit.

 4. Calling commit() or rollback() on the connection prior to
    prepare() performs a one-phase commit or rollback.

 5. Executing statements after prepare() but before commit() or
    rollback() results in an error (ProgrammingError?)

 6. Closing a connection with a prepared but uncommitted transaction
    rolls back that transaction.

 7. Some module-level attribute is present that indicates support for
    this API.

_______________________________________________
DB-SIG maillist  -  DB-SIG@python.org
http://mail.python.org/mailman/listinfo/db-sig

[DB-SIG] Two-phase commit API proposal (was Re: Any standard for two phase commit APIs?)

Reply via email to