On 27/08/15 16:53, [email protected] wrote:
Andy-- Thanks, these comments are really helpful! I've replied in-line in a few places to clarify or answer questions, or ask some of my own. {grin}--- A. Soroka The University of Virginia Library
If there are multiple writers, then (1) system aborts will always be possible (conflicting updates) and (2) locking on datastructres is necessary ... or timestamps and vector clocks or some such.Right, see below. Again, there are multiple writers, but they only see themselves, and only one committer. "Only one committer at a time" prevents conflicts, since there is no schema to violate, but it is a brutal way to deal with the problem. And the "re-run" scheme of operation means it will be a very real bottleneck.5) Snapshot isolation. Transactions do not see commits that occur during their lifetime. Each works entirely from the state of the DatasetGraph at the start of its life.But they see their own updates presumably?Right, that's exactly the purpose of taking off their own reference to the persistent datastructures at the start of the transaction. They "evolve" their datastructures independently.
When used in a program, persistent datastructures diverge when two writes act from the same base point.
Transactions do more - they are serializing all operations so there is a linear sequence of versions. This is the problem you identify below.
6) Only as many as one transaction per thread, for now. Transactions are not thread-safe. These are simplifying assumptions that could be relaxed later.TDB ended up there as well. There is, internally, a transaction object but it's held in a ThreadLocal and fetched when needed. Otherwise a lot of interface need a "transaction" parameter and its hard to reuse other code that does pass it through.That's close to what I sketched out.I have taken a second take on transactions with TDB2. This module is an independent transactions system, unlike TDB1 where it's TDB1-specific. https://github.com/afs/mantis/tree/master/dboe-transaction It needs documentation for use on its own but I have used in in another project to coordinate distributed transactions. (dboe = database operating environment)I need to study this more. Obviously, if I can take over some of your work, that would be ideal.My current design operates as follows: <snipped>Looks good. I don't quite understand the need to record and rerun though - isn't the power of pcollections that there can be old and new roots to the datastructures and commit is swap to new one, abort is forget the new one.Yeah, but my worry (perhaps just my misunderstanding) is over transactions interacting badly in the presence of snapshot isolation. Let's say we did use the technique of atomic swap, and consider the following scenario: T=-1 The committed datastructures contain triples T. T=0 Transaction 1 begins, taking a reference to the datastructures T=1 Transaction 2 begins, taking its own reference to the datastructures T=3 Transaction 1 does some updates, adding some triples T_1 to its own "branch", resulting in T+T_1. T=4 Transaction 2 does some updates, adding some triples T_2 to its own "branch", resulting in T+T_2. T=5 Transaction 1 commits, so that the committed triples are now T + T_1. T=6 Transaction 2 commits, so that the committed triples are now T + T_2.
We lost Transaction 1's T_1 triples. I think this technique actually
requires _merge_ instead of swap, either merge-into-open-transactions
(after a commit) which isn't snapshot isolation or merge-into-commit
(instead of swap-into-commit). But there's plenty of chance that I'm
just misunderstanding this whole thing. {grin} I have not designed a
transaction system over persistent datastructures before, and I
welcome correction. I also need to research more about persistent
datastructures with merge capability.
which is why 2+ writers needs locking or aborts. The common ACID example: Start with: :account :balance 10 . W1 (adds 5 to the account) Delete :account :balance 10 . Insert :account :balance 15 . W2 (adds 7 to the account) Delete :account :balance 10 . Insert :account :balance 17 . Oh dear. No amount of merge or swap will work. Either W2 (or W1) is aborted or you get inconsistency.If you really, really want true parallel writers, then you'll need more than to rerun with a fixed resolution algorithm. It is hard enough in RDF to even detect there is a conflict.
An application transaction deciding itself to abort is rare so most overlapping writers will both commit - all the work of one is always going to be lost, presumable with retry and that means the app writer getting involved.
In an SQL database, a row lock on the account resolves the problem. But there isn't something in the data that is the same as the row in SQL.
:account :balance [ :currency "USD" ; :value 10 ] .so locking ":account" does not work. A conceptual entity isn't tied to a single graph node.
Single true-writer does not suffer from this. Or dirty reads. Or phantom reads. (these require thread locking and don't work with persistent datastructures).
But multiple-true writers aren't a common use case - multiple readers are. MR+SW lets readers proceed at any time without blocking; writers never system abort.
Andy
