On Fri, 17 Dec 2004 23:46:53 -0500, Bob Wyman <[EMAIL PROTECTED]> wrote:
Could you help me out > here by providing an example of how two sets of triples could be > concatenated while still preserving provenance? Please allow me me lead up to an example slowly... I've not really spent any time around provenance, but three approaches spring to mind, Well, actually the first approach is ad hoc-ness on the part of the RDF library developer - there are quite a few different specific tricks, e.g. Redland I believe associates every statement with a "context" so the context can provide any provenance info. These sort of things are fine internal within a system but aren't too helpful if you want to share that info and/or use it as part of the RDF (RDFS/OWL) inferencing. Long-term (i.e. it's unlikely to be in a W3C spec in the near future) there's Named Graphs as a very promising extension of RDF, for which the theory has been worked out and is nice & intuitive. Simple idea - you can give any set of statements a URI. (This is kind of what a lot of people seem to do anyway - e.g. I can consider my RSS feed as a single, isolated graph with the URI of the feed. But the Named Graph formalism makes it possible to mix 'em all up and still be able to reason properly). To quote from [1]: "Named Graphs allow publishers to communicate assertional intent, and to sign their graphs; information consumers can evaluate specific graphs using task-specific trust policies, and act on information from those Named Graphs that they accept." A third approach to provenance is already supported by RDF, reification. I still get very confused over this, Shelley Powers called it "The Big Ugly". It's a bit like quotation but /different/ - there is explanation in the Primer [2]. I've not used it myself, but a presentation I saw last week convinced me the approach can work - Rich Boakes on his RDFX [3] project, uses reification expressly for the purpose of provenance, it's clear in his slides [4]. Basic idea is that say for a triple: mydata:item123 dc:subject "fishing" . you can reify it to add the triples: mydata:triple321 rdf:type rdf:Statement . mydata:triple321 rdf:subject mydata:item123 . mydata:triple321 rdf:predicate dc:subject . mydata:triple321 rdf:object "fishing" . now you can say things about the triple, e.g. mydata:triple321 dc:source http://example.org/feed So whenever a statement is added to your knowledgebase, another 5+ are thrown in too. I asked Rich the obvious question - yes, it is a big overhead in terms of the amount of data that needs to be stored, but (if I remember correctly) wasn't a great cost in terms of computational complexity or difficult code. He just saw the extra bulk as a necessary cost. Cheers, Danny. [1] http://www.hpl.hp.com/techreports/2004/HPL-2004-57.html [2] http://www.w3.org/TR/rdf-primer/#reification [3] http://www.rdfx.org/ [4] http://www.rdfx.org/docs/presentation/SWAP_Boakes_AFrameworkForUnifiedInformationBrowsing.ppt -- http://dannyayers.com
