On 26/01/11 15:19, Stephen Allen wrote:
Hi,

I am working on updating Parliament to ARQ 2.8.7 from 2.8.5.  I've noticed
that there are now two parallel SPARQL/Update mechanisms [1].  I'm guessing
the "submission" package refers to the SPARQL Update member submission [2]
and "request" is the new support added for the SPARQL 1.1 Update working
draft [3]?

Don't use "submission" - it's legacy. And in the development version does not exist.

At 2.8.7, SPARQL 1.1 Update is the update langauge with some syntax support for the submission in the SPARQL 1.1 Update parser and, so far, this has proved adequate. This is what you get for syntaxARQ, and syntaxARQ is the default.

At some point, it is likely that SPARQL 1.1 strict becomes the default and an app would need to ask for synatxARQ, as it is with query.

I'd like to implement the new mechanism for Parliament.  Previously, I was
able to subclass UpdateProcessorVisitor (now
UpdateProcessorSubmissionVisitor) in order to provide my own implementation
for certain methods.  As an example, Parliament is implemented as a
collection of triple stores, so it can safely read from one graph while
writing to another one (and thus avoid buffering all statements in an
ArrayList).  Also ARQ currently stores all WHERE clause bindings in a
ArrayList during an insert/delete operation, but I would like to make this
more memory efficient for large updates by serializing bindings to disk in a
temporary file (after it passes a threshold).

I'd like to do exactly that for ARQ - could you submit a patch to Apache jena JIRA for incorporation into the main code base?

With an eye towards not copying a lot of ARQ code into my codebase, would it
be possible to change the class access modifier of
com.hp.hpl.jena.sparql.modify.UpdateEngineWorker to public instead of
package-private and make some of the private methods protected instead (also
com.hp.hpl.jena.sparql.modify.NodeTransformBNodesToVariables)?

Certainly - done in SourceForge SVN and a new 2.8.8-SNAPSHOT available with the changes. Let's identify which operations should protected and which private - i made them al protected for now.

http://openjena.org/repo-dev/com/hp/hpl/jena/arq/

This includes a zip distribution as well as the usual maven artifacts.


The current implementation is a bit "direct" in places - the buffering in ArrayList being a good example. It means it makes as few assumptions about the storage layer as possible but clearly that generality is at a potential cost.

UpdateEngineWorker is a step towards an extension mechanism. I'd like to identify a set of "update ops" that can be used to build each of the SPARQL Update request types so an implementation can add varying degree of efficiency for the amount of work needed.

If you have any insights here, I'd very much appreciate hearing them.

(You have presumably found the registry UpdateEngineRegistry - it all parallels the query engine extension design)

Thanks,
Stephen

P.S. I note that the following SPARQL/Update functions are specified in your
implementation/grammar: ADD, MOVE, COPY.  However I don't see them in the
latest working draft [3].  Presumably they are coming in the future?

They were only agreed at about the time of the last publication. But they are missing from the editors working draft as well; I've just added a note to the SPARQL-WG wiki as work items that need to be done. Thanks for catching this.

The syntax rules are:

ADD SILENT? GraphOrDefault TO GraphOrDefault
MOVE SILENT? GraphOrDefault TO GraphOrDefault
COPY SILENT? GraphOrDefault TO GraphOrDefault

GraphOrDefault    ::=           DEFAULT | GRAPH? IRIref

Given the separate

        Andy



[1] "com.hp.hpl.jena.sparql.modify.request" and
"com.hp.hpl.jena.sparql.modify.submission"
[2] http://www.w3.org/Submission/SPARQL-Update/
[3] http://www.w3.org/TR/sparql11-update/

Reply via email to