Re: RDF Patch - experiences suggesting changes

Andy Seaborne Wed, 19 Oct 2016 08:11:08 -0700


On 19/10/16 11:34, Stian Soiland-Reyes wrote:

I had a quick go, and the penalty from gzip with using expanded forms
without "R" was negligible (~ 0.1%, a bit higher with no prefixes). It
also means you can't process the RDF Patch in a parallel way without
preprocessing.  (Same for prefixes).

Good point ... for certain restricted patches like all QA or all QDwhere reordering (necessary for parallel processing) is possible.

At this point, specifying RDF Patch v2 without R until the interactionswith gzip etc compressing is better understood seems to me to be the wayforward.


It's easier to add later than add now and remove.

FYI:

The RIOT parers do interning of Nodes using a 1000 slot LRU cache (sonot large) - this leads to 30%, sometimes 50%, less memory being useddue to shared terms. In practice, it results interning all propertiesin a vocabulary (a 1000 well used properties being quite unusual) whichR does not do.


        Andy


Using "R" could also restrict possible compression pattern, for instance in :

A <http://example.com/thingie15>
<http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
<http://schema.org/Person> .
A <http://example.com/thingie15>
<http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
<http://schema.org/Person> .

a good compression algorithm might recognize patterns in here like:

 .\nA <http://example.com/thingie

<http://www.w3.org/

#type> <http://schema.org/


Using "R" would restrict possible patterns - betting on it recognizing
"> .\nA R R" (which sometimes would work well).



Can RDF Patch items within a transaction be considered in any order
(first all the DELETEs, then all the ADDs), or do they have to be
played back linearly?


On 19 October 2016 at 10:57, Rob Vesse <[email protected]> wrote:

Yes but ANY is a form of lossy compression. You lost the actual details of what 
was removed. Also it can only be used for removals and yields no benefit for 
additions.

 On the other hand REPEAT is lossless compression.

 However if you apply a general-purpose compression like gzip on top of the 
patch you probably get just as good compression without needing any special 
tokens. In my experience repeat is more useful in compact binary formats where 
you can use fewer bytes to encode it then either the term itself or a reference 
to the term in some lookup table.

On 14/10/2016 17:09, "Andy Seaborne" <[email protected]> wrote:

    These two together seem a bit contradictory.  The advantage of ANY, with
    versions, is that it is form of compression.

Re: RDF Patch - experiences suggesting changes

Reply via email to