[
https://issues.apache.org/jira/browse/JENA-330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Stephen Allen updated JENA-330:
-------------------------------
Attachment: TestLargeUpdates.java
config-null.ttl
JENA-330_20121016.patch
The attached patch provides streaming SPARQL Update all the way through Fuseki
and into the underlying GraphStore.
Testing against GraphStoreNull and using jena-client to generate a never-ending
INSERT DATA, the system will run indefinitely. Use config-null.ttl as your
Fuseki configuration, and run TestLargeUpdates to test this (TestLargeUpdates
uses jena-client, which is available in the Experimental branch). On my
machine, connecting to localhost, I get a steady-state of about 32k triples per
second.
Note that there are still some limits in this patch:
1) Queries with blank nodes are inverted, and queries with RDF lists are mostly
in order except for the initial statement that points to the head of the list.
Examples:
:s :p [ :q [ :q :r ] ] .
becomes:
_:b0 :q :r . _:b1 :q _:b0 . :s :p _:b1 .
:s :p (1 2 3 4)
becomes:
_:b2 rdf:first 1 .
_:b2 rdf:rest _:b3 .
_:b3 rdf:first 2 .
_:b3 rdf:rest _:b4 .
_:b4 rdf:first 3 .
_:b4 rdf:rest _:b5 .
_:b5 rdf:first 4 .
_:b5 rdf:rest rdf:nil .
:s :p _:b2 .
2) DatasetUpdateSink bypasses UpdateEngineFactory for INSERT DATA / DELETE
DATA, and calls .add(Quad) .remove(Quad) directly on the DatasetGraph.
3) There is still a limit on the number of update operations that can appear in
an update request (this is because Update() in the grammar is recursive, and
will hit a StackOverflowError quickly). Uncomment the delete line in
TestLargeUpdates to see this.
As this is a pretty large patch, I didn't want to commit it without some
review. If someone could take a look, that would be great! And also an
opinion on whether 1) is important (I'm thinking it's not too critical) and how
to solve 2).
> Streaming support for SPARQL Update queries and streaming support for quads
> in INSERT DATA / DELETE DATA queries
> ----------------------------------------------------------------------------------------------------------------
>
> Key: JENA-330
> URL: https://issues.apache.org/jira/browse/JENA-330
> Project: Apache Jena
> Issue Type: Improvement
> Components: ARQ
> Reporter: Stephen Allen
> Assignee: Stephen Allen
> Priority: Minor
> Attachments: config-null.ttl, JENA-330_20121016.patch,
> TestLargeUpdates.java
>
>
> The SPARQL Update parser currently parses all update queries into a single
> UpdateRequest object which holds them in memory. Instead the parser should
> insert queries into something like a Sink<Update>. Additionally it should
> put the quads from INSERT_DATA and DELETE_DATA into a Sink<Quad> instead of
> an ArrayList.
> This should allow the creation of a streaming update parser, which could be
> combined with JENA-309 to have full streaming into an underlying
> transactional store and the ability to handle arbitrarily large INSERT_DATA
> or DELETE_DATA queries (to the limits of the transaction system).
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira