On 19/05/14 21:22, Stephen Allen wrote:
All,
I've been working on a design for remote transactions for SPARQL (initially
for the query and update endpoints, but most likely for GSP as well). My
initial draft is at [1]. I would appreciate any feedback, particularly in
the places where I have made notes (the @@ sections).
Although I tried to design it to not preclude distributed transactions, I
am intentionally limiting the specification to what is necessary for a
single server currently.
Implementation seems like it should be fairly straightforward, targeting
Fuseki 2 / TDB. There look to be a few bumps that need to be overcome
(particularly TDB's usage of ThreadLocal variables
in DatasetGraphTransaction), but there do not appear to be any
showstoppers. Although I have not started writing any code yet.
There is DatasetGraphTxn, direct from the StoreConnection, which is the
actual one-time transaction object that goes in the thread local.
It itself needs MRSW semantics.
I have started JENA-700 to track this work [2].
-Stephen
[1] http://people.apache.org/~sallen/sparql11-transaction/
[2] https://issues.apache.org/jira/browse/JENA-700
== Protocol
POST /transaction
GET /transaction/txid
Nice way to view a transaction - I'd have made the txid a large globally
unique number so there is no risk of guessing it by a third party.
A container for transactions makes sense but the other mapping to HTTP
verbs seems odd to me.
The transaction state is part of the the document at /transaction/txid.
So PUT/DELETE does not to me seem the right way to handle it as the txid
does not exist (probably) after DELETE. A system may wish to see the
state of a complete transaction though after abort.
REST is state exchange so a POST or PUT of a new state document to the
server to commit or abort the transaction (or use a query string
parameter) woudl be my inclination.
POST is preferable because I think it is changing part of the state of
the transaction. Information like start of transaction time, who
started it,... is not overwritten and can be guaranteed by the server.
PUT is able the whole document being replaced by whatever the client claims.
"authenticated sessions" (if you mean two-way certificated https) are
very complicated to manage.
== Promotable transactions
Transaction that start READ and become WRITE aren't ruled up in TDB --
they would have the effect of potentially causing an abort at the point
when they get promoted but any system has that possibility at this point
or commit point with conflicting updates. In RDF conflict is messier
because RDF triples do not correspond to application conceptual
entities, leading to unexpected conflict, or so fine grain the system is
less that acceptable performance.
With SPARQL operations being coarse grained read/query or write/update,
promotability makes more sense.
== Timeouts and deadlocks.
The SPARQL protocol operations within a transaction still need to be
atomic with respect to the transaction. The client may now have
multiple threads invoking operations - or pass the transaction id to
another machine.
Multiple writers: For TDB, it's internally single writer with a lock for
multiple writers so there is deadlock potential with two writers. We
could add a "begin(WRITE)-or-bounce" operation.
(If a system is multiple writer, the pain point moves to system
generated aborts which don't happen in TDB).
== ETags and optimistic transactions
Have you any thoughts how how this might play with etags?
Andy