On 22/09/12 15:41, Stephen Allen wrote:
I am working on JENA-330 (converting the Update parser to streaming)
and I had a couple of questions:

1) What version of cpp do you use to generate arq.jj and sparql_11.jj?
  My version inserts a bunch of extra newline characters.   cpp (GCC)
3.4.4 (cygming special, gdc 0.12, using dmd 0.125)

cpp (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3

but I used to use cpp under cygwin.

The cygwin output might need feeding through dos2unix.

2) How important is the TripleCollector "mark" functionality?  It
appears to be in use in the Collection and PropertyList parsing stages
to ensure that statements are added to the QuadAcc in the same order
that they appear in the query.  However, RDF is unordered, so it
doesn't seem strictly necessary.  In a streaming situation, its
presence complicates things.  Can I simply eliminate this
functionality?  Or is it important for some reason I can't see?

The mark is for RDF lists and nested structures.

 :s :p [ :q :r ] .
==>
 :s :p _:b0 .
  _:b0 :q :r


  :s :p (1 2)
==>
 :s :p _:b0 .
 _:b0 rdf:first 1 .
 _:b0 rdf:rest _:b1 .
 _:b1 rdf:first 2 .
 _:b1 rdf:rest rdf:nil


It keeps the triples generated in the order in the AST they are encountered. A list element refers to the next element so you can't generate it's rdf:rest until you know what to refer to. To keep the rdf:first and rdf:rest together (for appearances sake, such as printing the query or update).

It's probable not necessary to do it with a mark. It might be possible to do as a sliding window of two elements; I have done this on an experimental datastructure project so we to operate in the forward direction. Working forwards is a tail recursion and can be loopified. Working on the way back out isn't streaming (it needs stack depth).

It gets messy with nested structures:

  :s :p (1 ("a" "b") 2 )

keeping the rdf:firsts in order of 1, "a", "b", 2 is nice albeit not necessary.

One approach is that it's streaming except for compound structures. You have to ask how you get a compound structure in the first place.

I think the important cases are

:s :p (
  "item 1"
  "item 2"
) .

:s :p [
  :q 1 ;
  :q 2 ;
] .


where it's easy to generate a huge item worth streaming. If these can be handled, but the more complex ones don't stream, it's still a big win IMO.

        Andy


Thanks!

-Stephen


Reply via email to