Thoughts on Transactions, Concurrency, Versioning and Locking

Reto Bachmann-Gmuer Mon, 27 Sep 2010 00:57:04 -0700

*Wishlist*

   - CLEREZZA-219: Rollback / atomicity
   - A committed Transactions is like a patch to a set of graphs
   - Transaction can be copied and executed on another instance
   - Optimistic concurrency control
   - Code that doesn't modifies MGraphs need no transactions
   - No more ConurrentModificationException: Iterators returned by MGraphs
   point to the version at the moment of invocation


*Problems*

   - Consistency with Bnode
      - Suppose we have an mGraph with: _:a rdf:type ex:Animal
      - The transactions t1 and t2 both read the above unique triple of the
      MGraph
      - t1 adds _:a rdf:type ex:Cat
      - t2 adds _:a rdf:type ex:Dog
      - The second thread should fail to commit

*Solution approach*

I'm unfamiliar with the java transaction api, I'm not sure how this could
replace/affect the following

I immagine the api being used like this (in Scala, in Java some additional
interfaces would be needed)

val tm: TcTransasctionManager = ...
val t = tm.createTransaction { tcManager =>
  val mGraph = tcManger.getMGraph(new UriRef("http://example.org/test.graph
"))
  //within this block it appears as if there's no one else in the world, we
don't have to care
  //about locking
}
//retries up to 5 time if applying the result patch fails
//transactions that perform only read operations produce an empty patch and
commit thus never fails
val result = t.commit(5)

//we could also run the transaction without any change being actualy sone to
the tcManager
//val result = t.simulate()

A Transaction is specified by a function that takes a TcManager, the
functionality of the transaction is implemented in this function and all
access to mGraphs occurs via the the TcManager received as argument. The
list of triple collections as well as the triple collection returned by this
TcManager appear not to change unless for the changes done within the
perform method.

   - when a transaction is committed or when a read access is done, all
   previous write call are transformed into a patch, this patch is associated
   to the transaction
   - All read access are performed against a base graph and a set of
   transaction that have been committed but have not yet been applied to the
   base graph or that are associated to the transaction performing the read
   operation.
   - Changes within a transaction produce a patch, at the time of commit it
   is checked that the mgraph is still compatible with the change (removed
   triples are still there and the context of affected bnodes is unchanged)
   - when a triple is added or removed containing an existing bnode (one
   that has not been added with the transaction) a the context of this bnode in
   the original graph is marked for removal and a replacement subgraph with
   distinct bnode objects is created.
   - A scheduled task monitors the readlock on mGraphs and choosed suitable
   moments for applying committed patches to the base mgraph, only during this
   operation the base mgraph is write-locked. No transaction function is
   executed while patches are applied, no iterator should be open the mGraph or
   some mechanism not to ConcurrentModificationExceptions must be applied.


*Performance*
On read access apart from the base-graph a set of patches has to be checked,
both for additions and removals of triples. As under normal circumstances
the umber of patches should be relatively small this shouldn't be too bad.
Write operations should get significantly faster as generating and adding a
patch does not require a qrite lock on the base graph.

Thoughts on Transactions, Concurrency, Versioning and Locking

Reply via email to