On 19/05/14 21:22, Stephen Allen wrote:
All,

I've been working on a design for remote transactions for SPARQL (initially
for the query and update endpoints, but most likely for GSP as well).  My
initial draft is at [1].  I would appreciate any feedback, particularly in
the places where I have made notes (the @@ sections).

Although I tried to design it to not preclude distributed transactions, I
am intentionally limiting the specification to what is necessary for a
single server currently.

Implementation seems like it should be fairly straightforward, targeting
Fuseki 2 / TDB.  There look to be a few bumps that need to be overcome
(particularly TDB's usage of ThreadLocal variables
in DatasetGraphTransaction), but there do not appear to be any
showstoppers.  Although I have not started writing any code yet.

There is DatasetGraphTxn, direct from the StoreConnection, which is the actual one-time transaction object that goes in the thread local.

It itself needs MRSW semantics.


I have started JENA-700 to track this work [2].

-Stephen

[1] http://people.apache.org/~sallen/sparql11-transaction/
[2] https://issues.apache.org/jira/browse/JENA-700



== Protocol

POST  /transaction
GET   /transaction/txid

Nice way to view a transaction - I'd have made the txid a large globally unique number so there is no risk of guessing it by a third party.

A container for transactions makes sense but the other mapping to HTTP verbs seems odd to me.

The transaction state is part of the the document at /transaction/txid.

So PUT/DELETE does not to me seem the right way to handle it as the txid does not exist (probably) after DELETE. A system may wish to see the state of a complete transaction though after abort.

REST is state exchange so a POST or PUT of a new state document to the server to commit or abort the transaction (or use a query string parameter) woudl be my inclination.

POST is preferable because I think it is changing part of the state of the transaction. Information like start of transaction time, who started it,... is not overwritten and can be guaranteed by the server. PUT is able the whole document being replaced by whatever the client claims.

"authenticated sessions" (if you mean two-way certificated https) are very complicated to manage.

== Promotable transactions

Transaction that start READ and become WRITE aren't ruled up in TDB -- they would have the effect of potentially causing an abort at the point when they get promoted but any system has that possibility at this point or commit point with conflicting updates. In RDF conflict is messier because RDF triples do not correspond to application conceptual entities, leading to unexpected conflict, or so fine grain the system is less that acceptable performance.

With SPARQL operations being coarse grained read/query or write/update, promotability makes more sense.

== Timeouts and deadlocks.

The SPARQL protocol operations within a transaction still need to be atomic with respect to the transaction. The client may now have multiple threads invoking operations - or pass the transaction id to another machine.

Multiple writers: For TDB, it's internally single writer with a lock for multiple writers so there is deadlock potential with two writers. We could add a "begin(WRITE)-or-bounce" operation.

(If a system is multiple writer, the pain point moves to system generated aborts which don't happen in TDB).

== ETags and optimistic transactions

Have you any thoughts how how this might play with etags?

        Andy

Reply via email to