All,

Some thoughts on Update grammar.  Would love to get other's opinions
before posting a comment to the SPARQL WG.

I have been attempting to implement streaming updates in Jena.  Part
of that effort has been rewriting recursive grammar definitions to
have no recursion (thus avoiding a stack overflow when there are a
large number of Update operations in a single Update Request).  This
in some cases may mean an LL(2) parser.

The grammar for Update in the SPARQL 1.1 PR is as follows:

   [29] Update ::= Prologue ( Update1 ( ';' Update )? )?

This is currently implemented in our JavaCC parser as:

   Prologue() (Update1() ( <SEMICOLON> Update() )? )?

Unfortunately, the best I non-recursive solution was able to come up with was:

   Prologue() ( Update1() ( LOOKAHEAD(2) <SEMICOLON> Prologue()
Update1() )* ( <SEMICOLON> )? )?

This is *almost* equivalent to the grammar in the spec, except for one
detail: it does not allow a trailing Prologue(), which the recursive
definition allows.  I can't seem to get any closer, mainly due to that
optional semicolon and optional trailing prologue (although you cannot
have a trailing semicolon if you have a lone trailing prologue).

The more I look at the problem, the more I tend to think that maybe
the spec's Update grammar is faulty.  I believe it should not allow
trailing prologues.  It also should not allow just a prologue and
nothing else (Query forbids this).  Examples of queries that I think
should be invalid (but are not currently):

==========
PREFIX : <http://example.org/>
==========
PREFIX : <http://example.org/>
insert data { } ;
PREFIX : <http://example.org/>
==========

Additionally, I would argue that the text of the Update spec [1]
contradicts the existing grammar.  Specifically the definition in
section 3:
    "A request is a sequence of operations and is terminated by
    EOF (End of File). Multiple operations are separated by a ';'
    (semicolon) character. A semicolon after the last operation
    in a request is optional."

A prologue by itself is not an operation as defined in section 4.3 [2].

I would propose to the working group that we instead adopt the
following grammar:

   [29] Update ::= Prologue Update1 ( ';' Prologue Update1 )* ( ';' )?

This could be easily represented in JavaCC as:

   Prologue() Update1() ( LOOKAHEAD(2) <SEMICOLON> Prologue()
Update1() )* ( <SEMICOLON> )?

The trailing semicolon seems to force us into using an LL(2) parser.
I cannot see a way to write this grammar in LL(1).

I have three questions that would be nice to have answered before I
post a comment to the WG:

1) Is there a non-recursive way to write the existing rule 29 that
exactly matches the semantics of the spec?
2) Is there a way to write my proposed rule 29 as LL(1) (even if has
to use recursion)?
3) Would the RDF WG be open to changing the grammar at this point?  I
know it is in PR stage, but this would be feedback from attempting
implementation.

-Stephen

[1] http://www.w3.org/TR/2012/PR-sparql11-update-20121108/#updateLanguage
[2] 
http://www.w3.org/TR/2012/PR-sparql11-update-20121108/#formalModelGraphUpdate

Reply via email to