[ 
https://issues.apache.org/jira/browse/JENA-330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen Allen updated JENA-330:
-------------------------------

    Attachment: TestLargeUpdates.java
                config-null.ttl
                JENA-330_20121016.patch

The attached patch provides streaming SPARQL Update all the way through Fuseki 
and into the underlying GraphStore.

Testing against GraphStoreNull and using jena-client to generate a never-ending 
INSERT DATA, the system will run indefinitely.  Use config-null.ttl as your 
Fuseki configuration, and run TestLargeUpdates to test this (TestLargeUpdates 
uses jena-client, which is available in the Experimental branch).  On my 
machine, connecting to localhost, I get a steady-state of about 32k triples per 
second.

Note that there are still some limits in this patch:

1) Queries with blank nodes are inverted, and queries with RDF lists are mostly 
in order except for the initial statement that points to the head of the list.  
Examples:

   :s :p [ :q [ :q :r ] ] .
becomes:
   _:b0 :q :r . _:b1 :q _:b0 . :s :p _:b1 .
   

  :s :p (1 2 3 4)
becomes:
  _:b2 rdf:first 1 .
  _:b2 rdf:rest _:b3 .
  _:b3 rdf:first 2 .
  _:b3 rdf:rest _:b4 .
  _:b4 rdf:first 3 .
  _:b4 rdf:rest _:b5 .
  _:b5 rdf:first 4 .
  _:b5 rdf:rest rdf:nil .
  :s :p _:b2 .


2) DatasetUpdateSink bypasses UpdateEngineFactory for INSERT DATA / DELETE 
DATA, and calls .add(Quad) .remove(Quad) directly on the DatasetGraph.

3) There is still a limit on the number of update operations that can appear in 
an update request (this is because Update() in the grammar is recursive, and 
will hit a StackOverflowError quickly).  Uncomment the delete line in 
TestLargeUpdates to see this.


As this is a pretty large patch, I didn't want to commit it without some 
review.  If someone could take a look, that would be great!  And also an 
opinion on whether 1) is important (I'm thinking it's not too critical) and how 
to solve 2).

                
> Streaming support for SPARQL Update queries and streaming support for quads 
> in INSERT DATA / DELETE DATA queries
> ----------------------------------------------------------------------------------------------------------------
>
>                 Key: JENA-330
>                 URL: https://issues.apache.org/jira/browse/JENA-330
>             Project: Apache Jena
>          Issue Type: Improvement
>          Components: ARQ
>            Reporter: Stephen Allen
>            Assignee: Stephen Allen
>            Priority: Minor
>         Attachments: config-null.ttl, JENA-330_20121016.patch, 
> TestLargeUpdates.java
>
>
> The SPARQL Update parser currently parses all update queries into a single 
> UpdateRequest object which holds them in memory.  Instead the parser should 
> insert queries into something like a Sink<Update>.  Additionally it should 
> put the quads from INSERT_DATA and DELETE_DATA into a Sink<Quad> instead of 
> an ArrayList.
> This should allow the creation of a streaming update parser, which could be 
> combined with JENA-309 to have full streaming into an underlying 
> transactional store and the ability to handle arbitrarily large INSERT_DATA 
> or DELETE_DATA queries (to the limits of the transaction system).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to