On 01/02/13 17:44, Paul Gearon wrote:
On Fri, Feb 1, 2013 at 12:21 PM, Andy Seaborne <[email protected]> wrote:
On 01/02/13 16:51, Paul Gearon wrote:
On Fri, Feb 1, 2013 at 10:38 AM, Andy Seaborne <[email protected]
<mailto:[email protected]>> wrote:
Nasty things RDF alt/bag/seq from the POV of a database.
Could I get your perspective on this please? I accept that best practice
abandoned them long ago, and RDF 1.1 is deprecating them. I also
appreciate the mathematical elegance of the list structure. However, I
don't understand why Containers are considered nasty... with the
exception of the rdf:_nnn properties. Is that what you're referring to?
Or is it something else?
It's the DB implications, not the modelling issues, I was poking at.
?x rdfs:member ?o
is basically a whole DB scan looking for rdf:_NNN
Yuk.
I've always dealt with it in one of 2 ways.:
a) Look for rdfs:member (often not possible, but useful when available)
b) Use an index that orders by IRI. Since most indexes are tree based, then
this is easy enough. In Mulgara's case I created a "magic" predicate that
could match IRIs by some prefix, then looked for the prefix of
http://www.w3.org/1999/02/22-rdf-syntax-ns#_. We have a few range lookups
in the literals, and IRIs are stored similarly, so the code to find and
join this data to the triples already existed.
TDB indexes are 8 byte NodeId so they are not sorted by IRI.
But they could be :-)
By using the inline NodeId encodings, certain well-know IRIs could get
sort form NodeIds (but this is a format change - old/current and new
won't mix properly as they will not compare equal by NodeId)/
rdf:_ could have it's own inline Nodeid space using, say 32 bits for the
number part. Then they woudl be sorted by NodeId.
Even with ?x, need to scan for rdf:_1, rdf:_2, etc rdf:_99186
Could specially handle ... but Jena does not (ditto lists)
Well at least lists now have support with property paths in SPARQL.
While property paths give you the tools you need to read and manipulate
lists, there's still some work to be done for the programmer who uses them.
We probably need to get out a tutorial on common list operations in SPARQL,
since I find that to be a very common question.
A trie on IRIs?
The IRIs here all start with the same prefix, so I find that a simple
ordered index is enough.
There are modelling-wise issues:
Can add rdf:_1 to have two of them. In a Seq ??!!?
Merge two seqs with the same URI and you get ... a mess. Lists at least
will be bnodes.
Not easy at the modeling level no. Programming in the guts of the DB is
easy enough, since you can just append one container to another, or
whatever approach you choose. However, once the operations get abstracted
up the stack (in SPARQL, for instance), then it becomes something between
hard and intractable.
Practically:
No Turtle support
You/app writer see the horrible encoding.
I've started dropping RDF/XML support, so that's certainly an issue. I do
like Turtle list syntax.
:-)
Really RDF needs a list-ish thing as a first class data type, not encoded
as triples. If triples, when you work with the data, you see triples and
the app or library has to reconstruct. But you can also have mal-formed
encodings in triples and the code has to cope.
James Leigh proposed this before the new RDF WG was started. Working with
Datomic right now (a non-RDF triple-store, which has no support for this
either) has made it very clear how important first class collections would
be. I ended up adopting RDF-like practices in that case.
Paul
Andy