Re: [darcs-users] RDF metadata for patch files

Drew Perttula Wed, 25 Mar 2009 00:46:50 -0700

Hi- I'm a big fan of RDF, and I use it on all sorts of projects. Hereare my opinions on Max's questions, plus some more strawmen to keep thediscussion going:


Formats and RDF store capabilities:

Output RDF/XML for many of the the same reasons the other darcs commandshave an --xml flag. I think n3 may not be needed at all, if you like theproperty-only approach I recommend below.

Don't bother with sparql or a full triplestore unless you have anespecially great library to link in. In practice, people like me willjust want darcs to emit its RDF data (for one patch or for all of them)so we can transfer it to our external store of choice, e.g. Sesame. Overin *that* store, I'll probably have loaded in other related data too.That store becomes responsible for executing queries quickly, etc. Tokeep the data fresh, I might try use some darcs hook to run myresync-data-for-this-patch tool. Hopefully there is a suitable hook thatfires my program whenever a patch's metadata changes.



Patch properties:

I think it would be elegant to design the RDF system as a superset ofthe current metadata system, which means 'author', 'comment', 'name',and 'date' should be mapped to some RDF terms. Phase 1 would be to make'darcs cha --rdf' emit an RDF/XML document that contains the same dataas the current --xml output. A big part of phase 1 will be to come upwith the URI for a patch (see below).

Then, for phase 2, 'darcs add' could gain some general-purpose flagsthat let the user submit arbitrary additional edges off the patch URI.Here's where we can connect the patch to a license, a bug ticket, hoursspent, see-also links, calendar events, etc. I think accepting anarbitrary RDF graph with any amount of new structure may overwhelmusers, but implementing flags like --license sounds too limited.

Again, all of these proposals are for discussion purposes only, but theCLI could look like this:


darcs rec --property_literal dc:language "en" \
          --property_uri rdfs:seeAlso http://company.com/docs/feature1 \

--property_uri rdfs:seeAlsohttp://company.com/docs/deprecationStandard

'--author' becomes a synonym for '--property_literal dc:creator', etc.I'm imagining that darcs would know a fixed set of prefixes (like 'dc')for convenience, but that it would still be able to accept arbitraryURIs for the predicate (aka 'property' aka 'edge label').



Per-file metadata:

As to per-file metadata in darcs, I think that's not necessary. Thereare already ways to embed the file metadata in the file in many cases.You can also split your patch into a few pieces (and then combine themin a tag, perhaps) if you need the extra granularity.



Patch URIs:

In the output graph, what is the subject (aka 'source') of an edge likedc:creator? RDF wants this to be a URI; darcs already has its hashcodes. Here are some possibilities:

A:http://darcs.org/patch/20090323070000-ecde5-cd5fdd37119bcd748942a0bf3d346d1d8da2a9f9B:http://some-url-root-you-entered.com/some/path/20090323070000-ecde5-cd5fdd37119bcd748942a0bf3d346d1d8da2a9f9C:http://your-darcsweb-repo.com/darcs/?r=projname;a=commit;h=20090228090242-312f9-c37d395e337108a7a224650414bc18a58e263481.gz

[A] is automatic, and seems like the Simplest Thing that Could PossiblyWork. darcs.org may get pounded with futile requests to resolve theURIs, though.

[C] is a special case of [B] (plus .gz at the end), and it's coolbecause the URLs would be resolvable. That's a desirable property of RDFURIs, though never a requirement.



More use cases:

How do I link a bug ticket with a darcs patch that fixes it? There aremany ad-hoc schemes that involve putting the link id into the patchcomment text, but I think the problems there are obvious. I'd like tosay "this patch fixes bug http://mycompany.com/jira/FOO-345"; and then inanother UI, be able to jump from that bug to the darcsweb display of therelated patch(es).

I'd love to have an hours-spent value on my patches. Suppose I got myIDE^H^H^H editor and shells to watch how long I was active on whichproject, and the output was available to my 'darcs rec' wrapper. Thiswould be awesome data to stick in the repo.

On my web project, I have a lot of patches that say "implemented featureZ, see http://theproject.com/demo/of/Z for an example". Someday I mightadd a feature where if you're in admin mode, you can jump from a page onthe site back to the list of tickets (and therefore bugs anddiscussions) that were involved in that page.

I already use URIs for my tag names, so that other systems (e.g. arelease notes generator) could make more statements about the tags. I'dbe happy for darcs to be making its own URIs for those tags so I can usethe comment for free-form text again. As with all RDF, this is notimpossible or even difficult to do with tag names or their hash ids, butit's easier to deal with tons of data sources when they're all in thesame address space (URIs).

It might be cool to link a patch to the results of a test suite that ranon that code. This would help us make UIs that let you jump from thepoint when the tests started taking too long back to the tags justbefore and after that event.



Workarounds:

There is nothing too hard about writing my own RDF and sticking it atthe end of each darcs comment as XML. It would be easy to find and parsesuch a thing back into a triple store. One could even use XSLT toconvert the output of 'darcs cha --xml' into RDF that would mesh withthe new statements inside the comments. So, it seems that this proposalis more about formalizing the current metadata in a standard way andless about offering a way to store license data in darcs.



Regarding advocacy:

The inkscape/SVG example is nice, but a more convincing demonstration of'critical mass' is PDF. Many PDF files (especially ones from acrobat, Ithink) have RDF/XML embedded in them.http://www.xml.com/pub/a/2004/09/22/xmp.html

It might help to never say the words 'semantic web', since like many RDFapplications out there, this has nothing to do with semantics (beyondbasic stuff, e.g. how darcs has a meaning for the term 'author'). Italso doesn't involve any kind of web until users want to connect theirdata sources together. That will work really well, but it's notnecessarily one of the goals of RDF-in-darcs.

_______________________________________________
darcs-users mailing list
[email protected]
http://lists.osuosl.org/mailman/listinfo/darcs-users

Re: [darcs-users] RDF metadata for patch files

Reply via email to