On 9/24/07, Matthieu Riou <[EMAIL PROTECTED]> wrote:
> On 9/24/07, Robert Burrell Donkin <[EMAIL PROTECTED]> wrote:

<snip>

> > Good idea. Although do you think that format should live side-by-side with
> > > DOAP?
> >
> > in terms of data storage?
>
>
> In terms of data format. Do we want to formalize the set of characteristics
> each license have?

the good thing about RDF is that there's no real need to overformalise
but certainly, there is a minimum amount of data that's going to be
needed before the analytics can work. probably best to drive this
analysis from use cases.

> Also I've tried to see this week-end how I could change my little dumb
> format to DOAP and there seems to be some sort of mismatch between the Maven
> artifact view of the world and the DOAP project view of it. A mapping could
> be:
>
> groupId -> Project/name
> artifactId -> Project/release/Version/file-release
>
> I could group all release versions / artifacts from a Maven repository to
> get a complete project descriptor. But then the groupId strikes me as a
> particularly ugly project name (i.e. org.apache.tomcat instead of Tomcat).

perhaps groupId and artifactId are really more like components of an uri's

mvn://org.apache.tomcat#servlet-api-2.1.jar

reasonably uniquely describes a single artifact.

> The thing gets kind of clunky. Also Maven repositories are not known for
> their good sanity.

yep - the quantity and quality of the meta-data varies

> Sometimes you have an artifact name, sometimes not.

this may work ok with an uri providing that the groupid was unique enough

> Sometimes you have a license name, sometimes a url, sometimes both.

i suspect that those specified by a name will come from a very limited
list. RDF needs an URI but perhaps we could clense the data on the way
in by searching the licenses in the system in order and matching by
name.

> So either we let go of DOAP, or we let go of all the nice data already
> available in Maven repositories, or we do a best effort even if it ends up
> being ugly and think of a way for people to clean up their own project info
> (in the same way they could submit license info for their projects).
>
> Opinions?

perhaps looking from a different perspective may be useful...

the license database is essentially interested in facts about concrete
artifacts. for example, about 'apache-foo-1.0.1.jar'. the only
reliable way to recognise an artifact is not by it's name but by a
crytographic hash - for example MD5 aabbcc (yes, i know that MD5 has
been cracked).

the hash can be used to find RDF claims about that Artifact. in
particular, a license URI.

the problem with linking this is with DOAP classes such as Version and
Project is that Artifact is currently missing. so we can't really the
classes. we can reuse subjects, though.

- robert

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to