Re: "Role of RSS in Science Publishing"

Danny Ayers Sat, 18 Dec 2004 02:57:20 -0800

On Fri, 17 Dec 2004 23:46:53 -0500, Bob Wyman <[EMAIL PROTECTED]> wrote:

Could you help me out
> here by providing an example of how two sets of triples could be
> concatenated while still preserving provenance?

Please allow me me lead up to an example slowly...
I've not really spent any time around provenance, but three approaches
spring to mind, Well, actually the first approach is ad hoc-ness on
the part of the RDF library developer - there are quite a few
different specific tricks, e.g. Redland I believe associates every
statement with a "context" so the context can provide any provenance
info. These sort of things are fine internal within a system but
aren't too helpful if you want to share that info and/or use it as
part of the RDF (RDFS/OWL) inferencing.

Long-term (i.e. it's unlikely to be in a W3C spec in the near future)
there's Named Graphs as a very promising extension of RDF, for which
the theory has been worked out and is nice & intuitive. Simple idea -
you can give any set of statements a URI. (This is kind of what a lot
of people seem to do anyway - e.g. I can consider my RSS feed as a
single, isolated graph with the URI of the feed. But the Named Graph
formalism makes it possible to mix 'em all up and still be able to
reason properly).

To quote from [1]:
"Named Graphs allow publishers to communicate assertional intent, and
to sign their graphs; information consumers can evaluate specific
graphs using task-specific trust policies, and act on information from
those Named Graphs that they accept."

A third approach to provenance is already supported by RDF,
reification. I still get very confused over this, Shelley Powers
called it "The Big Ugly". It's a bit like quotation but /different/ -
there is explanation in the Primer [2]. I've not used it myself, but a
presentation I saw last week convinced me the approach can work - Rich
Boakes on his RDFX [3] project, uses reification expressly for the
purpose of provenance, it's clear in his slides [4].

Basic idea is that say for a triple:

mydata:item123 dc:subject "fishing" .

you can reify it to add the triples:

mydata:triple321 rdf:type rdf:Statement .
mydata:triple321 rdf:subject mydata:item123 .
mydata:triple321 rdf:predicate dc:subject .
mydata:triple321 rdf:object "fishing" .

now you can say things about the triple, e.g.

mydata:triple321 dc:source http://example.org/feed

So whenever a statement is added to your knowledgebase, another 5+ are
thrown in too. I asked Rich the obvious question - yes, it is a big
overhead in terms of the amount of data that needs to be stored, but
(if I remember correctly) wasn't a great cost in terms of
computational complexity or difficult code. He just saw the extra bulk
as a necessary cost.

Cheers,
Danny.

[1] http://www.hpl.hp.com/techreports/2004/HPL-2004-57.html
[2] http://www.w3.org/TR/rdf-primer/#reification
[3] http://www.rdfx.org/
[4]
http://www.rdfx.org/docs/presentation/SWAP_Boakes_AFrameworkForUnifiedInformationBrowsing.ppt

http://dannyayers.com

Re: "Role of RSS in Science Publishing"

Reply via email to