On 04/08/15 22:26, Stephen Allen wrote:
To my knowledge, the only argument for using GSP instead of just
query+update would be performance/scalability.  Although, when I have
encountered those issues, I've attempted to fix the problem in query+update
instead (i.e. adding streaming support for update).  However, parsing large
SPARQL INSERT DATA operations is still slower than parsing NT (not to
mention rdf/thrift).  There are potential solutions for that (a
sparql/thrift implementation, even if it only did INSERT/DELETE DATA as
binary and left queries as string blobs), but obviously that doesn't exist
yet.
...
One of the motivating features of jena-client was the ability to perform
large streaming updates (not just inserts/deletes) to a remote store.  This
made up somewhat for the lack of remote transactions.  But maybe that isn't
too great of an argument, when we could just go ahead and implement remote
transaction support (here is a proposal I haven't worked on in over a year
[3]).

GSP is very useful for managing data in a store when combined with a union of named graphs as the default. Units of the overall graph can be deleted (bnodes included) and replaced.

It's also useful when scripting management of the data : using curl/wget you manage a store in simple scripts. Being able to do that in the same way in Java is helpful so the user does not need two paradigms.

Fuseki2 provides streaming updates for upload by GSP. RDFConnection has file upload features so the client-side does not need to parse the file, just pass an InputStream to HTTP layer.

RDFConnection adds the natural REST ops on datasets.


Authentication: we should use the HttpOp code - one reason is that it supports authentication for all HTTP verbs.

Jena-client's is more like JDBC
in that the transaction operations are exposed on the Connection object.
If the user chooses not to use the transaction mechanism then it will
default to using "auto-commit"

Agreed and in fact there an issue here with autocommit, streaming and and SELECT queries. The ResultSet is passed out of the execSelect operation but needs to be inside the transaction. Autocommit defeats that.

Which touches on the JDBC issue that drivers tend to execute and receive all the results before the client can start working on the answers (sometimes there are ways round this to be used with care). The issue is badly behaved clients hogging resources in the server.


Some possibilities:
0/ Don't support autocommit. In the local case that is quite natural; less so for the remote case because HTTP is not about state.

(I looked more at the remote case - e.g. the local connection implementation isolates results to get the same semantics as remote.)

1/ Autocommit cases receive the results completely. Some idioms don't work in autocommit more.

2/ An operation to make sure the QueryExecution is inside a transaction and also closed.

RDFConnection
public default void querySelect(Query query,
                                Consumer<QuerySolution> rowAction) {
    Txn.executeRead(this, ()->{
        try ( QueryExecution qExec = query(query) ) {
            qExec.execSelect().forEachRemaining(rowAction);
        }
    } ) ;
}

By the way - I added explicit transaction support and some example usage.

Maybe we can use jena-client as a base to work from?  If we feel we want to
add the separate GSP operations, then I think the extension point would be
to add a new GSP interface similar to Updater [5] (but lacking the generic
update query functionality).

I have no problem with jena-client as the starting point, I want to understand its design first.

I'm not seeing what the separate interfaces and *Statement gives the application - maybe I'm missing something here - it does seem to make it more complicated compared to just performing the operation. For *Statement, it's still limited in scope to the connection but can be passed out.

Please remove the Sesame comments in javadoc and documentation. There's no need to put comments about another community on implementation choices that can change in javadoc and documentation. If you want to write up the reasons then have a blog item somewhere, and hence making it more time specific.

We might want to consider a non-HTTP remote connection; at least design for the possibility. My motivation was initially more around working with other people's published data (i.e. a long way away, not same data centre).

        Andy

Reply via email to