On 04/08/15 22:26, Stephen Allen wrote:
To my knowledge, the only argument for using GSP instead of just
query+update would be performance/scalability. Although, when I have
encountered those issues, I've attempted to fix the problem in query+update
instead (i.e. adding streaming support for update). However, parsing large
SPARQL INSERT DATA operations is still slower than parsing NT (not to
mention rdf/thrift). There are potential solutions for that (a
sparql/thrift implementation, even if it only did INSERT/DELETE DATA as
binary and left queries as string blobs), but obviously that doesn't exist
yet.
...
One of the motivating features of jena-client was the ability to perform
large streaming updates (not just inserts/deletes) to a remote store. This
made up somewhat for the lack of remote transactions. But maybe that isn't
too great of an argument, when we could just go ahead and implement remote
transaction support (here is a proposal I haven't worked on in over a year
[3]).
GSP is very useful for managing data in a store when combined with a
union of named graphs as the default. Units of the overall graph can be
deleted (bnodes included) and replaced.
It's also useful when scripting management of the data : using curl/wget
you manage a store in simple scripts. Being able to do that in the same
way in Java is helpful so the user does not need two paradigms.
Fuseki2 provides streaming updates for upload by GSP. RDFConnection has
file upload features so the client-side does not need to parse the file,
just pass an InputStream to HTTP layer.
RDFConnection adds the natural REST ops on datasets.
Authentication: we should use the HttpOp code - one reason is that it
supports authentication for all HTTP verbs.
Jena-client's is more like JDBC
in that the transaction operations are exposed on the Connection object.
If the user chooses not to use the transaction mechanism then it will
default to using "auto-commit"
Agreed and in fact there an issue here with autocommit, streaming and
and SELECT queries. The ResultSet is passed out of the execSelect
operation but needs to be inside the transaction. Autocommit defeats that.
Which touches on the JDBC issue that drivers tend to execute and receive
all the results before the client can start working on the answers
(sometimes there are ways round this to be used with care). The issue
is badly behaved clients hogging resources in the server.
Some possibilities:
0/ Don't support autocommit. In the local case that is quite natural;
less so for the remote case because HTTP is not about state.
(I looked more at the remote case - e.g. the local connection
implementation isolates results to get the same semantics as remote.)
1/ Autocommit cases receive the results completely. Some idioms don't
work in autocommit more.
2/ An operation to make sure the QueryExecution is inside a transaction
and also closed.
RDFConnection
public default void querySelect(Query query,
Consumer<QuerySolution> rowAction) {
Txn.executeRead(this, ()->{
try ( QueryExecution qExec = query(query) ) {
qExec.execSelect().forEachRemaining(rowAction);
}
} ) ;
}
By the way - I added explicit transaction support and some example usage.
Maybe we can use jena-client as a base to work from? If we feel we want to
add the separate GSP operations, then I think the extension point would be
to add a new GSP interface similar to Updater [5] (but lacking the generic
update query functionality).
I have no problem with jena-client as the starting point, I want to
understand its design first.
I'm not seeing what the separate interfaces and *Statement gives the
application - maybe I'm missing something here - it does seem to make it
more complicated compared to just performing the operation. For
*Statement, it's still limited in scope to the connection but can be
passed out.
Please remove the Sesame comments in javadoc and documentation. There's
no need to put comments about another community on implementation
choices that can change in javadoc and documentation. If you want to
write up the reasons then have a blog item somewhere, and hence making
it more time specific.
We might want to consider a non-HTTP remote connection; at least design
for the possibility. My motivation was initially more around working
with other people's published data (i.e. a long way away, not same data
centre).
Andy