Big flashing lights on my radar - I'm a fan of RDF, and Gump is an exceedingly neat idea. So one or one comments on top of Stefano's comments (sorry about the quoting):
[[ > basically this is like the RSS and Atom feeds that Gump put's out, > except they also have data at the module level (for all projects within > module). Basically, I figured that folks might sometimes want specific > information, and sometimes want it all (to feed into some store). Kewl. ]] Yep. [[ > I started out pretty simply, Gump defines some classes (Project, > Repository) and some properties (e.g. name) and then makes some > statements (Project:X depends upon Project:Y, Project:X resides within > Repo:Z). Nothing complicated, but a start. nice. ]] The vocab is nice. A while back I did some work on a project vocab [1] but spent far too much time on the terms and not enough on doing stuff with it - got bogged down, bloat, overengineered. So I reckon just starting with a few terms and actually *using* them is the best route forward. [[ > Even this small foray allowed me to come up with some questions, and > want more input: <semantic-web-hat mode="on"> Copying Dirk since he's a semweb fan as much as I am. > Some areas to look into: > > 1) Design Decisions/Questions: > > 1.1) Ought we define the URI for a project (or other entity) to point to > the standalone RDF for that entity? I'm sure there is no need to, but it > might allow tools to discover upon demand. This would be a URL and my suggestion would be something like http://gump.apache.org/data/path/project/20040827 ]] It could well be helpful, but I'd suggest being careful about what statements are made involving the URI - i.e. does that resource actually identify the project? If so a direct rdf:about to it is ok, but otherwise a technique that's getting popular is to have an rdfs:seeAlso to it. There's not the same level of commitment, but automatic tools can still pick up the data. (Other statements such as the type of the resource can also be made, but aren't necessary from day one). [[ > 1.2) What if there are two sources of RDF triples about an entity? Say > we have facts in a standalone document, and in a shared one (or in a > triple store)? Are triples merged? Yes, that's be beauty of the RDF model: you can have statements coming from different sources, and they get aggregated. > What if they clash with each other? > [e.g. one source says X dependsOn Y, but another says Y dependsOn X or > something contradictory?] [snip] ]] Just to expand on Stefano's comments a little - the basic RDF model would in effect see both/all the statements as being true. ANDed if you like. There are tricks (particularly at the OWL level) which can be used to spot inconsistencies using general-purpose inferencing, but then again there's nothing to stop more application-specific reasoning being coded on top of the RDF data model. [[ > 1.3) How do we define a URI to represent a long lived (yet varying) > entity? eheh, great question ;-) ]] Easy! My home page is http://dannyayers.com. The representations of it (the HTML & RSS) vary a lot, but conceptually it's the same entity. [[ > Ought we (say) include the version of Cocoon in the URI, so we > know facts about that release/state, or do we just say Cocoon? I'm a big fan of numerical URIs for long-term persisting things. The less implicit semantics in the URI, the higher the chance of surviving changes without requiring the URI to change. ]] If I understand Stefano correctly, I agree - if it's worth saying, make it explicit in some other way as additional statements, the URI is opaque to (most) machine processing. [[ > If Cocoon > dependsOn Avalon today, but not tomorrow, what happens to the Cocoon > dependsOn Avalon triple? Is it wrong? Expired? This is where it starts to get very tricky. ]] Yup ;-) [provenance through reification snipped] [[ Well, I would just create a new model everytime, ... ]] Yep, that's the easiest, only retain the current version of the model/graph/document at the location which is going to have its data processed. Another alternative would be to wrap up the dependencies in a little cluster with a timestamp, something like: Snapshot containsDependency Avalon Snapshot date 2004-08-28 I suspect this may be complicating matter unnecessarily though - letting the triples 'expire' through their absence in the latest version is a lot easier. There's some doc on this kind of thing at [2]. [[ > 2) Ongoing investigations: > > 2.1) I think we wish to define a Gump Ontology at > 'http://gump.apache.org/schemas/main/1.0/'? I am still a little confused > by OWL and/or RDFS, and I know there is no immediate need to hurry. I > guess I feel without an Ontology we are speaking a language foreign to > everybody, but that is ok as we learn to speak. That said, how do we go > about refining this? Just set it out there and tinker? I would not worry about this for now, just like you don't need an XMLSchema to write some well-formed XML. ]] Yep, tinker. [[ > 2.2) I think we wish to map the Gump Ontology to DOAP and others (even > parts of FOAF). How would we do that with some OWL ontologies. ]] and/or RDF schema. For example, in your schema/ontology you could say: gump:Project rdfs:subClassOf doap:Project or doap:Project rdfs:subClassOf gump:Project or *both*. By asserting both you'd be saying that every individual in the set of doap:Projects is also member of the the set of gump:Projects, and vice versa. You can say the same thing using : doap:Project owl:equivalentClass gump:Project There is another candidate relationship, owl:sameAs, but this says that the two individuals (in this case classes) are the same. This can be problematic both conceptually (are you sure the classes are *exactly* the same?) and sometimes in practice (the DL breed of reasoners tend to choke). It's not written in stone anywhere, but folks who generally know what they're doing (like Dan Brickley) tend to avoid owl:sameAs for this purpose. You may be able to reuse(/hijack!) some of the DOAP authoring tools. There no reason Gump can't use the same syntax style (with equivalent meaning to your examples): <Project rdf:about="http://apache.org/gump/project/xml-xerces/" xmlns="http://gump.apache.org/schemas/main/1.0/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" > <dependsOn rdf:resource="http://apache.org/gump/project/xjavac/"/> <dependsOn rdf:resource="http://apache.org/gump/project/bootstrap-ant/"/> <residesWithin rdf:resource="http://apache.org/gump/repository/xml/"/> <name>xml-xerces</name> </Project> [[ > and how would we test/exercise it? you don't, you just publish your data in the best way possible and see what happens ;-) ]] Hmm, I dunno, this is one to think about. Assuming you had all the (combined) project data in a store, what kind of questions could you ask? What if you stuck the RDF into a reasoner and asserted project X (defined only in DOAP) depends on project Y (defined in Gump), presumably then X would inherit all the dependencies of Y. The reasoner may be able to spit out X's dependencies. Scruffy types tend to play around with this stuff in cwm [3], those that comb their hair often opt for Protege [4]. [[ > 2.3) Ought we consider (over time) an ASF-wide Ontology, perhaps > defining TLPs/other communities, and having Gump state triples for this > project memberOf this community. [We tend to figure out communities from > the repository, e.g. cvs.sf.net or ...] Adam, keep focus: one thing at a time ;-) ]] Yes and yes ;-) [[ > 3) Usages: > > 3.1) I was hoping to work on PSP to do queries into the RDBMS. This is > primarily for historical information, but I was thinking about using it > for dependency information also. The more I think abotu the RDF > information, and triple queries, it seems an RDF store might be a better > place to hold/maintain and query. This information seems RDF-ish, not > RDBMS-ish. Agreed. I would use a triple store with an RDQL query engine (Redland has such a thing and has Python hooks) ]] Ah, forgot about Redland (again). Yes again. [[ > 3.2) What other 'users' of this descriptor information seem viable? > Ought tools (e.g. Depot) be wishing to figure things out from it? Others? ]] I wouldn't worry too much about that - generate good data and applications will emerge. The first batch will probably just be pretty node & arc visualizations, but they can be useful too... [[ Once the RDF infrastructure is in place, one of my goals is to add "legal" metadata to the project and create an inferencing layer that indicates whether or not a project is *legal* depending on the combination of the licenses. ]] Kewl. Cheers, Danny. [1] http://purl.org/stuff/project/ [2] http://www.w3.org/TR/swbp-n-aryRelations/ [3] http://www.w3.org/2000/10/swap/doc/cwm.html [4] http://protege.stanford.edu/ -- http://dannyayers.com --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
