re. RDF 102 s.v.p...

Danny Ayers Sat, 28 Aug 2004 11:26:28 -0700

Big flashing lights on my radar - I'm a fan of RDF, and Gump is an
exceedingly neat idea. So  one or one comments on top of Stefano's
comments (sorry about the quoting):


[[
> basically this is like the RSS and Atom feeds that Gump put's out, 
> except they also have data at the module level (for all projects within 
> module). Basically, I figured that folks might sometimes want specific 
> information, and sometimes want it all (to feed into some store).

Kewl.
]]

Yep.

[[
> I started out pretty simply, Gump defines some classes (Project, 
> Repository) and some properties (e.g. name) and then makes some 
> statements (Project:X depends upon Project:Y, Project:X resides within 
> Repo:Z). Nothing complicated, but a start.

nice.
]]

The vocab is nice. A while back I did some work on a project vocab [1]
but spent far too much time on the terms and not enough on doing stuff
with it - got bogged down, bloat, overengineered. So I reckon just
starting with a few terms and actually *using* them is the best route
forward.

[[
> Even this small foray allowed me to come up with some questions, and 
> want more input:

<semantic-web-hat mode="on">

Copying Dirk since he's a semweb fan as much as I am.

> Some areas to look into:
> 
> 1) Design Decisions/Questions:
> 
> 1.1) Ought we define the URI for a project (or other entity) to point to 
> the standalone RDF for that entity? I'm sure there is no need to, but it 
> might allow tools to discover upon demand.

This would be a URL and my suggestion would be something like

http://gump.apache.org/data/path/project/20040827
]]

It could well be helpful, but I'd suggest being careful about what
statements are made involving the URI - i.e. does that resource
actually identify the project? If so a direct rdf:about to it is ok,
but otherwise a technique that's getting popular is to have an
rdfs:seeAlso to it. There's not the same level of commitment, but
automatic tools can still pick up the data. (Other statements such as
the type of the resource can also be made, but aren't necessary from
day one).


[[
> 1.2) What if there are two sources of RDF triples about an entity? Say 
> we have facts in a standalone document, and in a shared one (or in a 
> triple store)? Are triples merged? 

Yes, that's be beauty of the RDF model: you can have statements coming 
from different sources, and they get aggregated.

> What if they clash with each other? 
> [e.g. one source says X dependsOn Y, but another says Y dependsOn X or 
> something contradictory?]
[snip]

]]

Just to expand on Stefano's comments a little - the basic RDF model
would in effect see both/all the statements as being true. ANDed if
you like. There are tricks (particularly at the OWL level) which can
be used to spot inconsistencies using general-purpose inferencing, but
then again there's nothing to stop more application-specific reasoning
being coded on top of the RDF data model.


[[
> 1.3) How do we define a URI to represent a long lived (yet varying) 
> entity? 

eheh, great question ;-)
]]

Easy! My home page is http://dannyayers.com. The representations of it
(the HTML & RSS) vary a lot, but conceptually it's the same entity.

[[
> Ought we (say) include the version of Cocoon in the URI, so we 
> know facts about that release/state, or do we just say Cocoon? 

I'm a big fan of numerical URIs for long-term persisting things. The 
less implicit semantics in the URI, the higher the chance of surviving 
changes without requiring the URI to change.
]]

If I understand Stefano correctly, I agree - if it's worth saying,
make it explicit in some other way as additional statements, the URI
is opaque to (most) machine processing.

[[
> If Cocoon 
> dependsOn Avalon today, but not tomorrow, what happens to the Cocoon 
> dependsOn Avalon triple? Is it wrong? Expired?

This is where it starts to get very tricky.
]]

Yup ;-)

[provenance through reification snipped]

[[
Well, I would just create a new model everytime, 
...
]]

Yep, that's the easiest, only retain the current version of the
model/graph/document at the location which is going to have its data
processed.

Another alternative would be to wrap up the dependencies in a little
cluster with a timestamp, something like:

 Snapshot  containsDependency Avalon
 Snapshot  date 2004-08-28

I suspect this may be complicating matter unnecessarily though -
letting the triples 'expire' through their absence in the latest
version is a lot easier. There's some doc on this kind of thing at
[2].

[[
> 2) Ongoing investigations:
> 
> 2.1) I think we wish to define a Gump Ontology at 
> 'http://gump.apache.org/schemas/main/1.0/'? I am still a little confused 
> by OWL and/or RDFS, and I know there is no immediate  need to hurry. I 
> guess I feel without an Ontology we are speaking a language foreign to 
> everybody, but that is ok as we learn to speak. That said, how do we go 
> about refining this? Just set it out there and tinker?

I would not worry about this for now, just like you don't need an 
XMLSchema to write some well-formed XML.
]]

Yep, tinker.

[[
> 2.2) I think we wish to map the Gump Ontology to DOAP and others (even 
> parts of FOAF). How would we do that

with some OWL ontologies.
]]

and/or RDF schema.
For example,  in your schema/ontology you could say:

gump:Project rdfs:subClassOf doap:Project

or

doap:Project rdfs:subClassOf  gump:Project

or *both*. By asserting both you'd be saying that every individual in
the set of doap:Projects is also member of the the set of
gump:Projects, and vice versa. You can say the same thing using :

doap:Project owl:equivalentClass gump:Project

There is another candidate relationship, owl:sameAs, but this says
that the two individuals (in this case classes) are the same. This can
be problematic both conceptually (are you sure the classes are
*exactly* the same?) and sometimes in practice (the DL breed of
reasoners tend to choke). It's not written in stone anywhere, but
folks who generally know what they're doing (like Dan Brickley) tend
to avoid owl:sameAs for this purpose.

You may be able to reuse(/hijack!) some of the DOAP authoring tools.
There no reason Gump can't use the same syntax style (with equivalent
meaning to your examples):

<Project rdf:about="http://apache.org/gump/project/xml-xerces/";
   xmlns="http://gump.apache.org/schemas/main/1.0/";
   xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#";
>
    <dependsOn rdf:resource="http://apache.org/gump/project/xjavac/"/>
    <dependsOn rdf:resource="http://apache.org/gump/project/bootstrap-ant/"/>
    <residesWithin rdf:resource="http://apache.org/gump/repository/xml/"/>
    <name>xml-xerces</name>
</Project>


[[
> and how would we test/exercise it?

you don't, you just publish your data in the best way possible and see 
what happens ;-)
]]

Hmm, I dunno, this is one to think about. Assuming you had all the
(combined) project data in a store, what kind of questions could you
ask? What if you stuck the RDF into a reasoner and asserted project X
(defined only in DOAP) depends on project Y (defined in Gump),
presumably then X would inherit all the dependencies of Y. The
reasoner may be able to spit out X's dependencies.
Scruffy types tend to play around with this stuff in cwm [3], those
that comb their hair often opt for Protege [4].
    
[[
> 2.3) Ought we consider (over time) an ASF-wide Ontology, perhaps 
> defining TLPs/other communities, and having Gump state triples for this 
> project memberOf this community. [We tend to figure out communities from 
> the repository, e.g. cvs.sf.net or ...]

Adam, keep focus: one thing at a time ;-)
]]

Yes and yes ;-)

[[
> 3) Usages:
> 
> 3.1) I was hoping to work on PSP to do queries into the RDBMS. This is 
> primarily for historical information, but I was thinking about using it 
> for dependency information also.  The more I think abotu the RDF 
> information, and triple queries, it seems an RDF store might be a better 
> place to hold/maintain and query. This information seems RDF-ish, not 
> RDBMS-ish.

Agreed. I would use a triple store with an RDQL query engine (Redland 
has such a thing and has Python hooks)
]]

Ah, forgot about Redland (again). Yes again.

[[
> 3.2) What other 'users' of this descriptor information seem viable? 
> Ought tools (e.g. Depot) be wishing to figure things out from it? Others?
]]

I wouldn't worry too much about that - generate good data and
applications will emerge. The first batch will probably just be pretty
node & arc visualizations, but they can be useful too...

[[
Once the RDF infrastructure is in place, one of my goals is to add 
"legal" metadata to the project and create an inferencing layer that 
indicates whether or not a project is *legal* depending on the 
combination of the licenses.
]]

Kewl.

Cheers,
Danny.

[1] http://purl.org/stuff/project/
[2] http://www.w3.org/TR/swbp-n-aryRelations/
[3] http://www.w3.org/2000/10/swap/doc/cwm.html
[4] http://protege.stanford.edu/

-- 

http://dannyayers.com

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

re. RDF 102 s.v.p...

Reply via email to