Hi everyone, Here's a design dilemma for libpqxx 3.0 that I wanted to bounce off of those of you who are interested in getting involved in the library's design.
This is quite a long post, so if you don't have time or aren't into design discussions, this is a good time to drop it or start speed-reading! Here's what I'll be going into: 1. Introduction 2. Transaction properties 3. Interaction of properties 4. Design sketch 5. Open issues Way back when 2.0 came out, it sported a thoroughly redesigned transaction class hierarchy called the Transformed Transaction Taxonomy, or TTT for short. This gave us a lot more flexibility in defining new types of transactions. It also let programs specify things like "any transaction type will do here," or "this transaction can be any type as long as it's really a transaction on the back-end." The TTT was a big step forward. When PostgreSQL came out with nested-transaction support, creating a subtransaction type turned out to be so easy that it was operational and passing unit tests before I fully realized what I was doing. Now we're at a new crossroads. New transaction features have been added to the backend, and libpqxx should support them: two-phase transactions are critical for truly dependable complex applications, and read-only transactions could be useful. Then there's transaction nesting. I'd like that to be a bit more flexible than just having one "outer" transaction and a separate "nested" transaction type within that. Why not allow a nontransaction (which does nothing) within any other transaction type? Why not hide the difference between nested transactions and the basic, "outer" transaction types? I don't think this can be fitted comfortably into the existing transaction hierarchy. The TTT distinguishes first between "nontransaction" (no real transaction on the backend) and "dbtransaction" (actual atomicity). For the latter, it then offers "regular" and "extra-robust" service levels. Finally, these two types are templated for different isolation levels. Where would read-only and 2-phase transactions fit in? Where should subtransaction really be in the hierarchy? A single hierarchy just doesn't seem to cut it anymore. There are just too many transaction properties that matter: Transaction properties * Bracketing. Does a given transaction type really represent an atomic transaction on the backend, that can be either committed or aborted as a whole? * Transaction context. Is a given transaction object executing in a backend transaction? A transaction object is in a real transaction context if it's a real backend transaction itself, or if it is nested within one. This could be relevant for things like temp tables, which like to live inside transactions. * Nesting (requires recent backend). Is a bracketed transaction running inside another bracketed transaction? * Isolation level. The current solution is a hierarchy where some of the branches "have" isolation levels. A transaction's isolation level defines how changes made in other transaction contexts are visible to that transaction. * Service level. Having a separate "robust" service level only makes sense in some situations: it's not relevant in nested transactions, for instance. * Write access (requires recent backend). I'd really like to have read-only transactions as part of the transaction hierarchy, now that postgres supports them. Service levels and transaction bracketing don't matter for read-only transactions. * Two-phase commit (requires recent backend). This is really cool, and essential in high-end systems: let's say an application performs an action that involves changes to two databases--say a transfer of credit. It can issue "commit" statements on the two databases individually, but what if there's an error from the second database? The two-phase commit scheme lets the application issue a commit that checks for all possible errors and makes the database server remember the changes even after a crash, but isn't quite final. Once both databases have "promised to commit," the application or some middleware can issue a final "second-phase" commit to both. This (almost) can't possibly go wrong. I'd like to make as many of these properties as possible visible at compile time, so programs can demand certain properties in ways that don't require testing. But that will get a bit more complicated: you won't be able to say just "this function takes a dbtransaction argument." Interaction of properties Now that I've described the essential properties of various transaction types, I'll go into how they interact: * Nesting, in principle, is something the library can figure out internally. * Nontransactions (i.e. with no bracketing) could be nested arbitrarily in any context, and apart from bracketing, inherit all of their parents' properties. * It could make sense to have a read-only transaction nested inside a regular one, but not the other way around. * Read-only transactions can only occur inside a backend transaction context. * Service levels (regular and "robust") make sense only for "outer" transactions, not for nested ones. Nor do they make sense for read-only transactions. * Two-phase transactions don't make sense for read-only transactions either, since there are no changes to be careful about. * The "robust" service level may use two-phase transactions internally, if the backend version supports them. We could forbid combining them, or revert to a regular service level if a two-phase transaction is ongoing. (Note: I don't think there's a way of making robust transactions an emulation of two-phase commit on backends that don't support two-phase, but even if there is, I don't think it would be quite as safe as the real thing--so better not pretend that it is). * We could simply things by saying that a nested transaction should logically see its outer transaction as if it had been committed. After all, if the outer transaction does get rolled back, the nested transaction is aborted as well so that it doesn't matter what it sees or doesn't see. So one way of seeing things is that nontransactions have read-committed isolation level, and subtransactions inherit whatever isolation level the outer transaction has. Design sketch Here's what I had in mind, roughly, for the new transactions setup: 1. Separate "context" and "policy" classes. Connections and transactions would share a common base class that defines them as "contexts you can define transactions in." This has several implementation advantages: - Less boilerplate code for delegation: all the nonpublic functions that are currently in the connection hierarchy but must be made available to transaction classes and such will be in a common shared base. Currently this is done using tons of little transaction functions that just call functions with the same name on the transaction's connection. - Fewer "friend" declarations to make the above boilerplate code possible; just use "protected" instead. - No virtual functions in the transaction hierarchy, only a handful of virtual small functions in the policy classes. Doing this for the connection family worked out really well, but it turned out to be a very hard job for transactions in the current design. 2. No more deep transaction hierarchy visible to the program. We'd end up with a relatively flat list of transaction types along the lines of: - nontransaction - regular transaction (with isolation level if non-nested) - robust transaction (only if non-nested; with isolation level) - read-only transaction (with isolation level if non-nested) - two-phase transaction (only if non-nested; with isolation level) 3. There is no real distinction between committing and aborting nontransactions or read-only transactions. We can start thinking of these as "zero-phase commit" transaction types! Open issues * If we go with a common "transaction context" base class for connections and transactions, should programs be allowed to execute queries directly on the connection without creating any kind of transaction context? It would make a noticeable difference to very small programs, such as programming examples. * To what extent should the difference between "outer" and "nested" transactions be hidden? Should there be different types for this? Some of the properties we've looked at are really tied to that one important step, "is this new transaction going to be the outermost bracketing on this session?" We could make that a primary distinction with separate transaction types, or we could hide it and decide at runtime which kind of transaction to implement when the program creates a new transaction object. * What should the APIs for 0-phase, 1-phase, and 2-phase transactions look like? The current system is effectively that a commit function is available, but you don't need to use it in a 0-phase transaction. But it all works a little differently with 2-phase: the way this is implemented in the backend, 2-phase adds a third alternative (prepare) to the existing ones (commit/abort). It acts like a commit in some ways, except that the transaction's work will only become effective once the ultimate second-phase commit is given. That ultimate commit may come from the same connection and/or program, or from somewhere else; remember that 2-phase must still work even if the system has crashed somewhere between the commits, and the application may have to complete its work after a restart. * Maybe the pipeline concept can be worked into the execution context hierarchy, so that library features that may need to execute SQL commands internally (variables, nested transactions, tablestreams etc.) can be used normally inside pipelines. Effectively, pipelines would become just another transaction context, with an extended interface. Even nesting of pipelines could be supported: a nested pipeline wouldn't do anything itself, relegating all logic to the outermost pipeline, but this could make it much easier to pipeline sequences of queries generated by the library itself. So that's what's going on design-wise at the moment. Most of you will have something better to do than to engage in a discussion about this, and that's fine; if nothing else, I'll have documented some important considerations and hammered out some concepts I needed to get clear in my own mind. But if one of you has any useful thoughts on these issues, I'd love to hear them! Jeroen _______________________________________________ Libpqxx-general mailing list [email protected] http://gborg.postgresql.org/mailman/listinfo/libpqxx-general
