Hi Jeroen,

I like the design idea in general. I'll try to give my $0.02 on all of
the open issues you mentioned.

Jeroen T. Vermeulen wrote:
>  * If we go with a common "transaction context" base class for connections
> and transactions, should programs be allowed to execute queries directly
> on the connection without creating any kind of transaction context?  It
> would make a noticeable difference to very small programs, such as
> programming examples.

And it would also serve to encourage people to create larger pieces of
code ignoring transactions altogether, code that can't migrate to
transactions later. Instead of currently (pseudocode, I haven't actually
had time to use pqxx in ages):

   // Execute operation foo on any transaction trans
   void foo(transaction& trans);

you would make it possible for people who don't care about transactions
to pass around connections:

   // Execute operation foo on any connection conn
   void foo(connection& conn);

It would become extremely simple for them to write code like this. And
knowing most programmers that come from other libraries, almost
everybody *will* do it like this, nobody thinks in terms of transactions
before coming to pqxx! But not only would this make the code unable to
be used in both transactional and nontransactional contexts alike
(unless the code automatically falls into the deepest nested
transaction, something which I would discourage because it would become
very untransparent whether something was committed or not), it would
also make migrating code from a nontransactional to a transactional
model very difficult later on. No, I like the current model where you're
*forced* to take transactions into account early on in your design. It
promotes proper design of database interaction and puts transactional
integrity where it should be: in the front of people's minds.

Another problem with the approach is that people might mix calls to a
transaction object with calls to the connection. This is already a
problem with transactions: what if somebody executes a query on an outer
transaction while a nested transaction is in existence? Until now, this
was not a problem because most "simple" users coming from different
backgrounds (and usually having not nearly enough education in the way
databases and transactions should be used) would not use nested
transactions anyway. Adding the possibility to execute queries directly
on the connection makes this query-on-the-wrong-transaction mistake also
possible for these "simple" users.

Overall, I think it would be a *very* bad idea to do this, it's asking
for trouble. :-)

>  * To what extent should the difference between "outer" and "nested"
> transactions be hidden?  Should there be different types for this? 

No, I think not. I think that there are two different properties of
transactions that people care about here:

1. Atomicity. For atomicity, nestedness doesn't matter: the enclosing
transaction will be atomic as well.

2. Persistence. Some people will want to know for sure whether something
persisted or not, and keep track of the persisted database state in the
client state. Actually, that's the role of transactors (next to being
able to retry transactions automatically): persisting changes in the
client state only when the state was actually persisted in the database.
In that sense, a transactor on a nested transaction doesn't satisfy that
property of the original transactors anymore. Perhaps there should be a
"nested transactor" model parallel to the "nested transaction" model,
where the nested transactors would also only persist their changes to
the database state into the client program state when the complete
transaction had committed.

> Some
> of the properties we've looked at are really tied to that one important
> step, "is this new transaction going to be the outermost bracketing on
> this session?"  We could make that a primary distinction with separate
> transaction types, or we could hide it and decide at runtime which kind
> of transaction to implement when the program creates a new transaction
> object.

The difference only really matters when people care about (2), not about
(1). Shouldn't make it too hard on people who only care about (1), so no
different types as far as I'm concerned. If anything, you could make the
toplevel transaction type a separate subclass, but I'm afraid that
people will start asking for toplevel transaction references everywhere:

   void foo(toplevel_transaction& trans);

instead of asking for general transactions:

   void foo(transaction& trans)

simply because they normally instantiate only toplevel_transaction
objects because they haven't ever thought about nested transactions yet.

>  * What should the APIs for 0-phase, 1-phase, and 2-phase transactions
> look like?  The current system is effectively that a commit function is
> available, but you don't need to use it in a 0-phase transaction.  But it
> all works a little differently with 2-phase: the way this is implemented
> in the backend, 2-phase adds a third alternative (prepare) to the
> existing ones (commit/abort).  It acts like a commit in some ways, except
> that the transaction's work will only become effective once the ultimate
> second-phase commit is given.  That ultimate commit may come from the
> same connection and/or program, or from somewhere else; remember that
> 2-phase must still work even if the system has crashed somewhere between
> the commits, and the application may have to complete its work after a
> restart.

Hmmmm. I guess that perhaps you could split the 2pc stuff into two classes:

1. transaction (regular or robust)
2. 2pc transaction context

You can then run a regular or robust transaction in any 2pc transaction 
context, they do "prepare" when they normally would do "commit" 
(basically because the 2pc transaction context implements "commit" as 
"prepare", so they wouldn't want to know about this). And then you can 
have a whole hierarchy of 2pc transaction contexts that handle the final 
commit in different ways.

>  * Maybe the pipeline concept can be worked into the execution context
> hierarchy, so that library features that may need to execute SQL commands
> internally (variables, nested transactions, tablestreams etc.) can be
> used normally inside pipelines.  Effectively, pipelines would become just
> another transaction context, with an extended interface.  Even nesting of
> pipelines could be supported: a nested pipeline wouldn't do anything
> itself, relegating all logic to the outermost pipeline, but this could
> make it much easier to pipeline sequences of queries generated by the
> library itself.

Sounds very cool. You could attach a pipeline to any transaction context 
as a subcontext, and then feed the pipeline using all normal library 
features. Yeah, I like this.

BTW, have you noticed that with the new design you'll probably be able 
to attach a nontransaction as a subtransaction of a transaction? I don't 
know if that's bad. Probably not, it means you can run a nontransaction 
directly on a connection or on something else, and if you templatize a 
function that uses a nontransaction on its transaction context then it 
can run directly on a connection or on a transaction.

template<typename T>
void foo(T& context)
{
   nontransaction trans(context);
   // do stuff with trans
}


Hope this helps. :-)

--Bart
_______________________________________________
Libpqxx-general mailing list
[email protected]
http://gborg.postgresql.org/mailman/listinfo/libpqxx-general

Reply via email to