[libpqxx-general] Design discussion: transaction types

Jeroen T. Vermeulen Sun, 26 Feb 2006 01:15:04 -0800

Hi everyone,

Here's a design dilemma for libpqxx 3.0 that I wanted to bounce off of
those of you who are interested in getting involved in the library's
design.


This is quite a long post, so if you don't have time or aren't into design
discussions, this is a good time to drop it or start speed-reading!

Here's what I'll be going into:

1. Introduction
2. Transaction properties
3. Interaction of properties
4. Design sketch
5. Open issues

Way back when 2.0 came out, it sported a thoroughly redesigned transaction
class hierarchy called the Transformed Transaction Taxonomy, or TTT for
short.  This gave us a lot more flexibility in defining new types of
transactions.  It also let programs specify things like "any transaction
type will do here," or "this transaction can be any type as long as it's
really a transaction on the back-end."

The TTT was a big step forward.  When PostgreSQL came out with
nested-transaction support, creating a subtransaction type turned out to
be so easy that it was operational and passing unit tests before I fully
realized what I was doing.

Now we're at a new crossroads.  New transaction features have been added
to the backend, and libpqxx should support them: two-phase transactions
are critical for truly dependable complex applications, and read-only
transactions could be useful.

Then there's transaction nesting.  I'd like that to be a bit more flexible
than just having one "outer" transaction and a separate "nested"
transaction type within that.  Why not allow a nontransaction (which does
nothing) within any other transaction type?  Why not hide the difference
between nested transactions and the basic, "outer" transaction types?

I don't think this can be fitted comfortably into the existing transaction
hierarchy.  The TTT distinguishes first between "nontransaction" (no real
transaction on the backend) and "dbtransaction" (actual atomicity).  For
the latter, it then offers "regular" and "extra-robust" service levels. 
Finally, these two types are templated for different isolation levels.

Where would read-only and 2-phase transactions fit in?  Where should
subtransaction really be in the hierarchy?  A single hierarchy just
doesn't seem to cut it anymore.  There are just too many transaction
properties that matter:


Transaction properties

 * Bracketing.  Does a given transaction type really represent an atomic
transaction on the backend, that can be either committed or aborted as a
whole?

 * Transaction context.  Is a given transaction object executing in a
backend transaction?  A transaction object is in a real transaction
context if it's a real backend transaction itself, or if it is nested
within one.  This could be relevant for things like temp tables, which
like to live inside transactions.

 * Nesting (requires recent backend).  Is a bracketed transaction running
inside another bracketed transaction?

 * Isolation level.  The current solution is a hierarchy where some of the
branches "have" isolation levels.  A transaction's isolation level
defines how changes made in other transaction contexts are visible to
that transaction.

 * Service level.  Having a separate "robust" service level only makes
sense in some situations: it's not relevant in nested transactions, for
instance.

 * Write access (requires recent backend).  I'd really like to have
read-only transactions as part of the transaction hierarchy, now that
postgres supports them.  Service levels and transaction bracketing don't
matter for read-only transactions.

 * Two-phase commit (requires recent backend).  This is really cool, and
essential in high-end systems: let's say an application performs an
action that involves changes to two databases--say a transfer of credit. 
It can issue "commit" statements on the two databases individually, but
what if there's an error from the second database?  The two-phase commit
scheme lets the application issue a commit that checks for all possible
errors and makes the database server remember the changes even after a
crash, but isn't quite final.  Once both databases have "promised to
commit," the application or some middleware can issue a final
"second-phase" commit to both.  This (almost) can't possibly go wrong.


I'd like to make as many of these properties as possible visible at
compile time, so programs can demand certain properties in ways that don't
require testing.  But that will get a bit more complicated: you won't be
able to say just "this function takes a dbtransaction argument."


Interaction of properties

Now that I've described the essential properties of various transaction
types, I'll go into how they interact:

 * Nesting, in principle, is something the library can figure out internally.

 * Nontransactions (i.e. with no bracketing) could be nested arbitrarily
in any context, and apart from bracketing, inherit all of their parents'
properties.

 * It could make sense to have a read-only transaction nested inside a
regular one, but not the other way around.

 * Read-only transactions can only occur inside a backend transaction
context.

 * Service levels (regular and "robust") make sense only for "outer"
transactions, not for nested ones.  Nor do they make sense for read-only
transactions.

 * Two-phase transactions don't make sense for read-only transactions
either, since there are no changes to be careful about.

 * The "robust" service level may use two-phase transactions internally,
if the backend version supports them.  We could forbid combining them, or
revert to a regular service level if a two-phase transaction is ongoing. 
(Note: I don't think there's a way of making robust transactions an
emulation of two-phase commit on backends that don't support two-phase,
but even if there is, I don't think it would be quite as safe as the real
thing--so better not pretend that it is).

 * We could simply things by saying that a nested transaction should
logically see its outer transaction as if it had been committed.  After
all, if the outer transaction does get rolled back, the nested
transaction is aborted as well so that it doesn't matter what it sees or
doesn't see.  So one way of seeing things is that nontransactions have
read-committed isolation level, and subtransactions inherit whatever
isolation level the outer transaction has.


Design sketch

Here's what I had in mind, roughly, for the new transactions setup:

1. Separate "context" and "policy" classes.  Connections and transactions
would share a common base class that defines them as "contexts you can
define transactions in."  This has several implementation advantages:

 - Less boilerplate code for delegation: all the nonpublic functions that
are currently in the connection hierarchy but must be made available to
transaction classes and such will be in a common shared base.  Currently
this is done using tons of little transaction functions that just call
functions with the same name on the transaction's connection.

 - Fewer "friend" declarations to make the above boilerplate code
possible; just use "protected" instead.

 - No virtual functions in the transaction hierarchy, only a handful of
virtual small functions in the policy classes.  Doing this for the
connection family worked out really well, but it turned out to be a very
hard job for transactions in the current design.

2. No more deep transaction hierarchy visible to the program.  We'd end up
with a relatively flat list of transaction types along the lines of:
 - nontransaction
 - regular transaction (with isolation level if non-nested)
 - robust transaction (only if non-nested; with isolation level)
 - read-only transaction (with isolation level if non-nested)
 - two-phase transaction (only if non-nested; with isolation level)

3. There is no real distinction between committing and aborting
nontransactions or read-only transactions.  We can start thinking of these
as "zero-phase commit" transaction types!


Open issues

 * If we go with a common "transaction context" base class for connections
and transactions, should programs be allowed to execute queries directly
on the connection without creating any kind of transaction context?  It
would make a noticeable difference to very small programs, such as
programming examples.

 * To what extent should the difference between "outer" and "nested"
transactions be hidden?  Should there be different types for this?  Some
of the properties we've looked at are really tied to that one important
step, "is this new transaction going to be the outermost bracketing on
this session?"  We could make that a primary distinction with separate
transaction types, or we could hide it and decide at runtime which kind
of transaction to implement when the program creates a new transaction
object.

 * What should the APIs for 0-phase, 1-phase, and 2-phase transactions
look like?  The current system is effectively that a commit function is
available, but you don't need to use it in a 0-phase transaction.  But it
all works a little differently with 2-phase: the way this is implemented
in the backend, 2-phase adds a third alternative (prepare) to the
existing ones (commit/abort).  It acts like a commit in some ways, except
that the transaction's work will only become effective once the ultimate
second-phase commit is given.  That ultimate commit may come from the
same connection and/or program, or from somewhere else; remember that
2-phase must still work even if the system has crashed somewhere between
the commits, and the application may have to complete its work after a
restart.

 * Maybe the pipeline concept can be worked into the execution context
hierarchy, so that library features that may need to execute SQL commands
internally (variables, nested transactions, tablestreams etc.) can be
used normally inside pipelines.  Effectively, pipelines would become just
another transaction context, with an extended interface.  Even nesting of
pipelines could be supported: a nested pipeline wouldn't do anything
itself, relegating all logic to the outermost pipeline, but this could
make it much easier to pipeline sequences of queries generated by the
library itself.


So that's what's going on design-wise at the moment.  Most of you will
have something better to do than to engage in a discussion about this, and
that's fine; if nothing else, I'll have documented some important
considerations and hammered out some concepts I needed to get clear in my
own mind.  But if one of you has any useful thoughts on these issues, I'd
love to hear them!


Jeroen


_______________________________________________
Libpqxx-general mailing list
[email protected]
http://gborg.postgresql.org/mailman/listinfo/libpqxx-general

[libpqxx-general] Design discussion: transaction types

Reply via email to