Tried to splice the two emails back together. Comments inline: On Wed, Jun 6, 2012 at 11:49 AM, Andy Seaborne <[email protected]> wrote: > > On Wed, Jun 6, 2012 at 11:01 AM, Stephen Allen<[email protected]> wrote: >> >> Hi Andy, >> >> I was curious about a statement you made on the user list yesterday: >> >> On Tue, Jun 5, 2012 at 2:43 PM, Andy Seaborne<[email protected]> wrote: >>> >>> >>> Updates don't log. Form submitted updates are buffered - the entire string >>> is available to be printed but ones sent as "application/sparql-update" are >>> stream read (e.g. a large INSERT DATA { .... }) >>> >> >> I was looking at the parsing code, and it's true that >> "application/x-www-form-urlencoded" updates are buffered into a String >> early in the process, although it appears to me that for >> "application/sparql-update", the ARQParser and SPARQLParser11 also >> have to buffer all the update data in UpdateRequest objects (which for >> the DATA methods are an in-memory list of Quads). > > Yes and no. The input stream is directly parsed to a syntax tree so the > string (the body) of the POST is not available to be printed. There is > "just" the one copy. > > It also means if there is a parse error, the request is not printed in > normal set up. > > This is a balance - HTTP is generally about validate-execute and also it is > good to know the operation is valid before starting (not everything is > transactional). >
Yes, we would need to validate the entire request if the underlying store is not transactional and we want to attempt to be somewhat atomic (although the WD says SHOULD on this matter). We'd need to spill to disk if the request is large. > Maybe it should be less clever and do string-log-parse-execute. > >> I have been thinking about how to make this process streaming but I >> didn't know whether it made sense to try to modify the JavaCC parsers >> to be streaming or try to build a hybrid parser for just SPARQL >> Update. This hybrid would handle INSERT DATA and DELETE DATA in a >> streaming manner, and delegate regular updates to the existing parser. >> Do you have any thoughts or advice? >> > > There is a tension between operations of just INSERT/DELETE DATA and > combined, complex multi-part operations. The latter leans towards complex > parsing of whole sequences of actions before any operation. > Well, maybe we make it look like streaming, even if we're spilling out to disk. If we want to maintain atomicity (or an approximation), then we can dump the request to disk as we are validating, and then replay it for the actual update. For transactional stores, we can dispense with this and do validation and insertion/deletion at the same time. > So I think a separate, streaming, bulk-focused parser for INSERT DATA and > DELETE DATA would be the way to go (and update processor etc). > > javacc sharing is not something I have ever managed to get working to > separate the grammar from actions without distorting the entire thing to be > dominated by that design goal. I have tried to remove all code from the > parser, and just use events and the parser is streaming, the super class > code builds the state. It could be redone to pass in a builder, not use the > superclass. SPARQL Update does include the whole of SPARQL Query pattern > matching. > > So it's a bit of a mess from the multi-use point-of-view but the spec is > stable so copy is tolerable (if somewhat irritating from an aesthetic POV). > > Actually, this looks like the tip of a general need for a non-SPARQL (or > SPARQL+ if you prefer) remote interface to Fuseki. See also the users@ > question and transactions across several Fuseki operations. Transactions across multiple HTTP requests (even query/update) is one of the features I'd like to support in jena-client. Also remote query cancellation. And then probably lots of other stuff like query metrics, server health, etc, but that sort of thing seems custom to Fuseki. JMX might work well here. > So may there is a language lurking around here somewhere. It would > stream-execute. More fine grained than GSP, less than full SPARQL Update. > > INSERT DATA, DELETE DATA > BEGIN/COMMIT/ABORT > CLEAR/DROP, LOAD > CREATE DATASET, DROP DATASET > UNMOUNT DATASET > MOUNT DATASET > BACKUP DATASET > ... I see where you're going with the BEGIN/COMMIT/ABORT. I think some way of doing transactions and query identification (for tracking and cancellation purposes) is something whose time has arrived, and we should try out some implementations (with an eye towards future standardization). The DATASET commands might be a little bit too implementation specific for a standard, but would be a cool feature for Fuseki. However I'm not convinced that we should have the overlapping SPARQL Update commands. It seems that we probably want to support Update properly, and then there wouldn't be much use for the overlap. SPARQL 1.1 Update appears amenable to streaming if we put some brainpower on it. I don't have much experience with JavaCC, but am willing to learn. >> >> Another much simpler (although perhaps less satisfying) option would >> be to replace the ArrayList in QuadAcc with a DataBag. >> > > Partially - aren't we going to want to not allow other SPARQL operations > that aren't wanted when streaming? That's where I got to about a special > store language. I think we would want to allow all possible SPARQL Update operations while supporting streaming. As an example, the current app I'm developing generates triples/quads to be streamed, while at the same time interleaving update queries. Separate requests means you lose atomicity. > A SPARQL/Update can be several operations in a request. It's not just what > can be done but what needs to be made not possible. > > Is there an order preserving DataBag impl? This is also used to serialize > updates as well. Yes, the not-so-well-named DefaultDataBag [1]. > Feels like both small/general/nice-errors, and large/stream/less-nice-errors > are pulling in different directions a little. > Doing the Databag implementation seems like the most expedient way forward. But as it does impose an unnecessary cost on transactional stores, I think the improved Update parser will eventually be what we want. -Stephen [1] http://svn.apache.org/repos/asf/jena/trunk/jena-arq/src/main/java/org/openjena/atlas/data/DefaultDataBag.java
