All, Some thoughts on Update grammar. Would love to get other's opinions before posting a comment to the SPARQL WG.
I have been attempting to implement streaming updates in Jena. Part of that effort has been rewriting recursive grammar definitions to have no recursion (thus avoiding a stack overflow when there are a large number of Update operations in a single Update Request). This in some cases may mean an LL(2) parser. The grammar for Update in the SPARQL 1.1 PR is as follows: [29] Update ::= Prologue ( Update1 ( ';' Update )? )? This is currently implemented in our JavaCC parser as: Prologue() (Update1() ( <SEMICOLON> Update() )? )? Unfortunately, the best I non-recursive solution was able to come up with was: Prologue() ( Update1() ( LOOKAHEAD(2) <SEMICOLON> Prologue() Update1() )* ( <SEMICOLON> )? )? This is *almost* equivalent to the grammar in the spec, except for one detail: it does not allow a trailing Prologue(), which the recursive definition allows. I can't seem to get any closer, mainly due to that optional semicolon and optional trailing prologue (although you cannot have a trailing semicolon if you have a lone trailing prologue). The more I look at the problem, the more I tend to think that maybe the spec's Update grammar is faulty. I believe it should not allow trailing prologues. It also should not allow just a prologue and nothing else (Query forbids this). Examples of queries that I think should be invalid (but are not currently): ========== PREFIX : <http://example.org/> ========== PREFIX : <http://example.org/> insert data { } ; PREFIX : <http://example.org/> ========== Additionally, I would argue that the text of the Update spec [1] contradicts the existing grammar. Specifically the definition in section 3: "A request is a sequence of operations and is terminated by EOF (End of File). Multiple operations are separated by a ';' (semicolon) character. A semicolon after the last operation in a request is optional." A prologue by itself is not an operation as defined in section 4.3 [2]. I would propose to the working group that we instead adopt the following grammar: [29] Update ::= Prologue Update1 ( ';' Prologue Update1 )* ( ';' )? This could be easily represented in JavaCC as: Prologue() Update1() ( LOOKAHEAD(2) <SEMICOLON> Prologue() Update1() )* ( <SEMICOLON> )? The trailing semicolon seems to force us into using an LL(2) parser. I cannot see a way to write this grammar in LL(1). I have three questions that would be nice to have answered before I post a comment to the WG: 1) Is there a non-recursive way to write the existing rule 29 that exactly matches the semantics of the spec? 2) Is there a way to write my proposed rule 29 as LL(1) (even if has to use recursion)? 3) Would the RDF WG be open to changing the grammar at this point? I know it is in PR stage, but this would be feedback from attempting implementation. -Stephen [1] http://www.w3.org/TR/2012/PR-sparql11-update-20121108/#updateLanguage [2] http://www.w3.org/TR/2012/PR-sparql11-update-20121108/#formalModelGraphUpdate
