Andy,

Thanks for the comments, very helpful.  It is unfortunate that I
didn't look at this earlier in the year, but indeed my proposed change
would be relatively minor to the end user, and I've figured out a
solution staying within the current spec!

More comments inline:


On Thu, Nov 29, 2012 at 2:36 PM, Andy Seaborne <[email protected]> wrote:
> Hi Stephen,
>
> Adding requirements to the official SPARQL grammar by parsable in particular
> ways is adding too much.  The goal (SPARQL 1.0 and 1.1) is it''s LL(1) AKA
> simple technically and that communicates.  People can implement the language
> in different ways and the grammar defines the language but it does not
> prescribe the implementation.

In hindsight I would have argued that the optional terminating
semicolon should be eliminated to stay LL(1) without recursion.

> There may be other ways to do streaming ... at the moment the parser builds
> a syntax tree (good for printing out request) but it could be event
> generating without a syntax builder built in - non-trivial change.
>
> So what is important to stream?
>
> 1/ INSERT DATA (and DELETE DATA)
>
> Maybe the data should not end up in the parser tree but a data bag?  It cn
> be pulled back in for small requests and printing.
>
> 2/ Per operation?  It could have a mode to emit each operation as it goes
> along.
>

All of this is implemented now and checked into a branch
(streaming-update).  I've spent a couple months on this, and it very
close to being finished.  Just point 1) in my comment on JENA-330 is
left to address!

> Possible grammar idea below ...
>
>
>> The grammar for Update in the SPARQL 1.1 PR is as follows:
>>
>>     [29] Update ::= Prologue ( Update1 ( ';' Update )? )?
>
>
> W3C process foo:
>
> The working group ends Dec 31.  There is no chance of an extension - we're
> under pressure to finish (not unreasonable ...)  Revisions, errata etc will
> noted afterwards.
>
> If the language changes, the spec would have to go back to another Last
> Call.  Implementations, of which there are many, would be affected.
>
> Just changing the grammar, not the language, could be argued to not affect
> implementations because it's the language that matters.  But that argument
> also argues for no change (because the grammar isn't so important to need a
> Last Call).  Morally, there would be a strong case for another Last Call on
> a grammar change.
>

Totally reasonable!

>
>> This is currently implemented in our JavaCC parser as:
>>
>>     Prologue() (Update1() ( <SEMICOLON> Update() )? )?
>>
>> Unfortunately, the best I non-recursive solution was able to come up with
>> was:
>>
>>     Prologue() ( Update1() ( LOOKAHEAD(2) <SEMICOLON> Prologue()
>> Update1() )* ( <SEMICOLON> )? )?
>
>
> Why not add a final optional prologue if the last SEMI is seen?
>
>     .... ( <SEMICOLON> (Prologue())? )? )?
>
> [untested]
>

I had been trying that to no avail because of the unlimited amount of
lookahead required (which defeat the streaming effort).  Syntactic
Lookahead to the rescue however, and I was able to write it as:

  Prologue()
  (
    Update1()
    (
      // This syntactic lookahead is necessitated by the optional
trailing semicolon and prologue
      LOOKAHEAD( <SEMICOLON> Prologue() ( <LOAD> | <CLEAR> | <DROP> | <ADD> |
                   <MOVE> | <COPY> | <CREATE> | <WITH> | <DELETE> | <INSERT> |
                   <USING> | <INSERT_DATA> | <DELETE_DATA> | <DELETE_WHERE> ) )
      <SEMICOLON>
      Prologue()
      Update1()
    )*
    (
      <SEMICOLON>
      Prologue()
    )?
  )?

>
>> This is *almost* equivalent to the grammar in the spec, except for one
>> detail: it does not allow a trailing Prologue(), which the recursive
>> definition allows.  I can't seem to get any closer, mainly due to that
>> optional semicolon and optional trailing prologue (although you cannot
>> have a trailing semicolon if you have a lone trailing prologue).
>>
>> The more I look at the problem, the more I tend to think that maybe
>> the spec's Update grammar is faulty.  I believe it should not allow
>> trailing prologues.  It also should not allow just a prologue and
>> nothing else (Query forbids this).  Examples of queries that I think
>> should be invalid (but are not currently):
>>
>> ==========
>> PREFIX : <http://example.org/>
>> ==========
>> PREFIX : <http://example.org/>
>> insert data { } ;
>> PREFIX : <http://example.org/>
>> ==========
>>
>> Additionally, I would argue that the text of the Update spec [1]
>> contradicts the existing grammar.  Specifically the definition in
>> section 3:
>>      "A request is a sequence of operations and is terminated by
>>      EOF (End of File). Multiple operations are separated by a ';'
>>      (semicolon) character. A semicolon after the last operation
>>      in a request is optional."
>
>
> Sequences can be zero length :-)
>

Hah, I don't like it still :)



>
>>
>> A prologue by itself is not an operation as defined in section 4.3 [2].
>>
>> I would propose to the working group that we instead adopt the
>> following grammar:
>>
>>     [29] Update ::= Prologue Update1 ( ';' Prologue Update1 )* ( ';' )?
>>
>> This could be easily represented in JavaCC as:
>>
>>     Prologue() Update1() ( LOOKAHEAD(2) <SEMICOLON> Prologue()
>> Update1() )* ( <SEMICOLON> )?
>>
>> The trailing semicolon seems to force us into using an LL(2) parser.
>> I cannot see a way to write this grammar in LL(1).
>>
>> I have three questions that would be nice to have answered before I
>> post a comment to the WG:
>>
>> 1) Is there a non-recursive way to write the existing rule 29 that
>> exactly matches the semantics of the spec?
>> 2) Is there a way to write my proposed rule 29 as LL(1) (even if has
>> to use recursion)?
>> 3) Would the RDF WG be open to changing the grammar at this point?  I
>> know it is in PR stage, but this would be feedback from attempting
>> implementation.
>>
>> -Stephen
>>
>> [1] http://www.w3.org/TR/2012/PR-sparql11-update-20121108/#updateLanguage
>> [2]
>> http://www.w3.org/TR/2012/PR-sparql11-update-20121108/#formalModelGraphUpdate
>>
>

Reply via email to