On 19/10/16 11:34, Stian Soiland-Reyes wrote:
I had a quick go, and the penalty from gzip with using expanded forms
without "R" was negligible (~ 0.1%, a bit higher with no prefixes). It
also means you can't process the RDF Patch in a parallel way without
preprocessing. (Same for prefixes).
Good point ... for certain restricted patches like all QA or all QD
where reordering (necessary for parallel processing) is possible.
At this point, specifying RDF Patch v2 without R until the interactions
with gzip etc compressing is better understood seems to me to be the way
forward.
It's easier to add later than add now and remove.
FYI:
The RIOT parers do interning of Nodes using a 1000 slot LRU cache (so
not large) - this leads to 30%, sometimes 50%, less memory being used
due to shared terms. In practice, it results interning all properties
in a vocabulary (a 1000 well used properties being quite unusual) which
R does not do.
Andy
Using "R" could also restrict possible compression pattern, for instance in :
A <http://example.com/thingie15>
<http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
<http://schema.org/Person> .
A <http://example.com/thingie15>
<http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
<http://schema.org/Person> .
a good compression algorithm might recognize patterns in here like:
.\nA <http://example.com/thingie
<http://www.w3.org/
#type> <http://schema.org/
Using "R" would restrict possible patterns - betting on it recognizing
"> .\nA R R" (which sometimes would work well).
Can RDF Patch items within a transaction be considered in any order
(first all the DELETEs, then all the ADDs), or do they have to be
played back linearly?
On 19 October 2016 at 10:57, Rob Vesse <[email protected]> wrote:
Yes but ANY is a form of lossy compression. You lost the actual details of what
was removed. Also it can only be used for removals and yields no benefit for
additions.
On the other hand REPEAT is lossless compression.
However if you apply a general-purpose compression like gzip on top of the
patch you probably get just as good compression without needing any special
tokens. In my experience repeat is more useful in compact binary formats where
you can use fewer bytes to encode it then either the term itself or a reference
to the term in some lookup table.
On 14/10/2016 17:09, "Andy Seaborne" <[email protected]> wrote:
These two together seem a bit contradictory. The advantage of ANY, with
versions, is that it is form of compression.