Andy, Thanks for the comments, very helpful. It is unfortunate that I didn't look at this earlier in the year, but indeed my proposed change would be relatively minor to the end user, and I've figured out a solution staying within the current spec!
More comments inline: On Thu, Nov 29, 2012 at 2:36 PM, Andy Seaborne <[email protected]> wrote: > Hi Stephen, > > Adding requirements to the official SPARQL grammar by parsable in particular > ways is adding too much. The goal (SPARQL 1.0 and 1.1) is it''s LL(1) AKA > simple technically and that communicates. People can implement the language > in different ways and the grammar defines the language but it does not > prescribe the implementation. In hindsight I would have argued that the optional terminating semicolon should be eliminated to stay LL(1) without recursion. > There may be other ways to do streaming ... at the moment the parser builds > a syntax tree (good for printing out request) but it could be event > generating without a syntax builder built in - non-trivial change. > > So what is important to stream? > > 1/ INSERT DATA (and DELETE DATA) > > Maybe the data should not end up in the parser tree but a data bag? It cn > be pulled back in for small requests and printing. > > 2/ Per operation? It could have a mode to emit each operation as it goes > along. > All of this is implemented now and checked into a branch (streaming-update). I've spent a couple months on this, and it very close to being finished. Just point 1) in my comment on JENA-330 is left to address! > Possible grammar idea below ... > > >> The grammar for Update in the SPARQL 1.1 PR is as follows: >> >> [29] Update ::= Prologue ( Update1 ( ';' Update )? )? > > > W3C process foo: > > The working group ends Dec 31. There is no chance of an extension - we're > under pressure to finish (not unreasonable ...) Revisions, errata etc will > noted afterwards. > > If the language changes, the spec would have to go back to another Last > Call. Implementations, of which there are many, would be affected. > > Just changing the grammar, not the language, could be argued to not affect > implementations because it's the language that matters. But that argument > also argues for no change (because the grammar isn't so important to need a > Last Call). Morally, there would be a strong case for another Last Call on > a grammar change. > Totally reasonable! > >> This is currently implemented in our JavaCC parser as: >> >> Prologue() (Update1() ( <SEMICOLON> Update() )? )? >> >> Unfortunately, the best I non-recursive solution was able to come up with >> was: >> >> Prologue() ( Update1() ( LOOKAHEAD(2) <SEMICOLON> Prologue() >> Update1() )* ( <SEMICOLON> )? )? > > > Why not add a final optional prologue if the last SEMI is seen? > > .... ( <SEMICOLON> (Prologue())? )? )? > > [untested] > I had been trying that to no avail because of the unlimited amount of lookahead required (which defeat the streaming effort). Syntactic Lookahead to the rescue however, and I was able to write it as: Prologue() ( Update1() ( // This syntactic lookahead is necessitated by the optional trailing semicolon and prologue LOOKAHEAD( <SEMICOLON> Prologue() ( <LOAD> | <CLEAR> | <DROP> | <ADD> | <MOVE> | <COPY> | <CREATE> | <WITH> | <DELETE> | <INSERT> | <USING> | <INSERT_DATA> | <DELETE_DATA> | <DELETE_WHERE> ) ) <SEMICOLON> Prologue() Update1() )* ( <SEMICOLON> Prologue() )? )? > >> This is *almost* equivalent to the grammar in the spec, except for one >> detail: it does not allow a trailing Prologue(), which the recursive >> definition allows. I can't seem to get any closer, mainly due to that >> optional semicolon and optional trailing prologue (although you cannot >> have a trailing semicolon if you have a lone trailing prologue). >> >> The more I look at the problem, the more I tend to think that maybe >> the spec's Update grammar is faulty. I believe it should not allow >> trailing prologues. It also should not allow just a prologue and >> nothing else (Query forbids this). Examples of queries that I think >> should be invalid (but are not currently): >> >> ========== >> PREFIX : <http://example.org/> >> ========== >> PREFIX : <http://example.org/> >> insert data { } ; >> PREFIX : <http://example.org/> >> ========== >> >> Additionally, I would argue that the text of the Update spec [1] >> contradicts the existing grammar. Specifically the definition in >> section 3: >> "A request is a sequence of operations and is terminated by >> EOF (End of File). Multiple operations are separated by a ';' >> (semicolon) character. A semicolon after the last operation >> in a request is optional." > > > Sequences can be zero length :-) > Hah, I don't like it still :) > >> >> A prologue by itself is not an operation as defined in section 4.3 [2]. >> >> I would propose to the working group that we instead adopt the >> following grammar: >> >> [29] Update ::= Prologue Update1 ( ';' Prologue Update1 )* ( ';' )? >> >> This could be easily represented in JavaCC as: >> >> Prologue() Update1() ( LOOKAHEAD(2) <SEMICOLON> Prologue() >> Update1() )* ( <SEMICOLON> )? >> >> The trailing semicolon seems to force us into using an LL(2) parser. >> I cannot see a way to write this grammar in LL(1). >> >> I have three questions that would be nice to have answered before I >> post a comment to the WG: >> >> 1) Is there a non-recursive way to write the existing rule 29 that >> exactly matches the semantics of the spec? >> 2) Is there a way to write my proposed rule 29 as LL(1) (even if has >> to use recursion)? >> 3) Would the RDF WG be open to changing the grammar at this point? I >> know it is in PR stage, but this would be feedback from attempting >> implementation. >> >> -Stephen >> >> [1] http://www.w3.org/TR/2012/PR-sparql11-update-20121108/#updateLanguage >> [2] >> http://www.w3.org/TR/2012/PR-sparql11-update-20121108/#formalModelGraphUpdate >> >
