Re: SPARQL Transaction

Andy Seaborne Tue, 20 May 2014 07:43:59 -0700

On 19/05/14 21:22, Stephen Allen wrote:

All,


I've been working on a design for remote transactions for SPARQL (initially
for the query and update endpoints, but most likely for GSP as well).  My
initial draft is at [1].  I would appreciate any feedback, particularly in
the places where I have made notes (the @@ sections).

Although I tried to design it to not preclude distributed transactions, I
am intentionally limiting the specification to what is necessary for a
single server currently.

Implementation seems like it should be fairly straightforward, targeting
Fuseki 2 / TDB.  There look to be a few bumps that need to be overcome
(particularly TDB's usage of ThreadLocal variables
in DatasetGraphTransaction), but there do not appear to be any
showstoppers.  Although I have not started writing any code yet.

There is DatasetGraphTxn, direct from the StoreConnection, which is theactual one-time transaction object that goes in the thread local.


It itself needs MRSW semantics.


I have started JENA-700 to track this work [2].

-Stephen

[1] http://people.apache.org/~sallen/sparql11-transaction/
[2] https://issues.apache.org/jira/browse/JENA-700



== Protocol

POST  /transaction
GET   /transaction/txid

Nice way to view a transaction - I'd have made the txid a large globallyunique number so there is no risk of guessing it by a third party.

A container for transactions makes sense but the other mapping to HTTPverbs seems odd to me.


The transaction state is part of the the document at /transaction/txid.

So PUT/DELETE does not to me seem the right way to handle it as the txiddoes not exist (probably) after DELETE. A system may wish to see thestate of a complete transaction though after abort.

REST is state exchange so a POST or PUT of a new state document to theserver to commit or abort the transaction (or use a query stringparameter) woudl be my inclination.

POST is preferable because I think it is changing part of the state ofthe transaction. Information like start of transaction time, whostarted it,... is not overwritten and can be guaranteed by the server.PUT is able the whole document being replaced by whatever the client claims.

"authenticated sessions" (if you mean two-way certificated https) arevery complicated to manage.


== Promotable transactions

Transaction that start READ and become WRITE aren't ruled up in TDB --they would have the effect of potentially causing an abort at the pointwhen they get promoted but any system has that possibility at this pointor commit point with conflicting updates. In RDF conflict is messierbecause RDF triples do not correspond to application conceptualentities, leading to unexpected conflict, or so fine grain the system isless that acceptable performance.

With SPARQL operations being coarse grained read/query or write/update,promotability makes more sense.


== Timeouts and deadlocks.

The SPARQL protocol operations within a transaction still need to beatomic with respect to the transaction. The client may now havemultiple threads invoking operations - or pass the transaction id toanother machine.

Multiple writers: For TDB, it's internally single writer with a lock formultiple writers so there is deadlock potential with two writers. Wecould add a "begin(WRITE)-or-bounce" operation.

(If a system is multiple writer, the pain point moves to systemgenerated aborts which don't happen in TDB).


== ETags and optimistic transactions

Have you any thoughts how how this might play with etags?

        Andy

Re: SPARQL Transaction

Reply via email to