Re: RDFConnection

Andy Seaborne Wed, 05 Aug 2015 10:15:26 -0700

On 04/08/15 22:26, Stephen Allen wrote:

To my knowledge, the only argument for using GSP instead of just
query+update would be performance/scalability.  Although, when I have
encountered those issues, I've attempted to fix the problem in query+update
instead (i.e. adding streaming support for update).  However, parsing large
SPARQL INSERT DATA operations is still slower than parsing NT (not to
mention rdf/thrift).  There are potential solutions for that (a
sparql/thrift implementation, even if it only did INSERT/DELETE DATA as
binary and left queries as string blobs), but obviously that doesn't exist
yet.

...

One of the motivating features of jena-client was the ability to perform
large streaming updates (not just inserts/deletes) to a remote store.  This
made up somewhat for the lack of remote transactions.  But maybe that isn't
too great of an argument, when we could just go ahead and implement remote
transaction support (here is a proposal I haven't worked on in over a year
[3]).

GSP is very useful for managing data in a store when combined with aunion of named graphs as the default. Units of the overall graph can bedeleted (bnodes included) and replaced.

It's also useful when scripting management of the data : using curl/wgetyou manage a store in simple scripts. Being able to do that in the sameway in Java is helpful so the user does not need two paradigms.

Fuseki2 provides streaming updates for upload by GSP. RDFConnection hasfile upload features so the client-side does not need to parse the file,just pass an InputStream to HTTP layer.


RDFConnection adds the natural REST ops on datasets.

Authentication: we should use the HttpOp code - one reason is that itsupports authentication for all HTTP verbs.

Jena-client's is more like JDBC
in that the transaction operations are exposed on the Connection object.
If the user chooses not to use the transaction mechanism then it will
default to using "auto-commit"

Agreed and in fact there an issue here with autocommit, streaming andand SELECT queries. The ResultSet is passed out of the execSelectoperation but needs to be inside the transaction. Autocommit defeats that.

Which touches on the JDBC issue that drivers tend to execute and receiveall the results before the client can start working on the answers(sometimes there are ways round this to be used with care). The issueis badly behaved clients hogging resources in the server.



Some possibilities:

0/ Don't support autocommit. In the local case that is quite natural;less so for the remote case because HTTP is not about state.

(I looked more at the remote case - e.g. the local connectionimplementation isolates results to get the same semantics as remote.)

1/ Autocommit cases receive the results completely. Some idioms don'twork in autocommit more.

2/ An operation to make sure the QueryExecution is inside a transactionand also closed.


RDFConnection
public default void querySelect(Query query,
                                Consumer<QuerySolution> rowAction) {
    Txn.executeRead(this, ()->{
        try ( QueryExecution qExec = query(query) ) {
            qExec.execSelect().forEachRemaining(rowAction);
        }
    } ) ;
}

By the way - I added explicit transaction support and some example usage.

Maybe we can use jena-client as a base to work from?  If we feel we want to
add the separate GSP operations, then I think the extension point would be
to add a new GSP interface similar to Updater [5] (but lacking the generic
update query functionality).

I have no problem with jena-client as the starting point, I want tounderstand its design first.

I'm not seeing what the separate interfaces and *Statement gives theapplication - maybe I'm missing something here - it does seem to make itmore complicated compared to just performing the operation. For*Statement, it's still limited in scope to the connection but can bepassed out.

Please remove the Sesame comments in javadoc and documentation. There'sno need to put comments about another community on implementationchoices that can change in javadoc and documentation. If you want towrite up the reasons then have a blog item somewhere, and hence makingit more time specific.

We might want to consider a non-HTTP remote connection; at least designfor the possibility. My motivation was initially more around workingwith other people's published data (i.e. a long way away, not same datacentre).


        Andy

Re: RDFConnection

Reply via email to