On 9/25/07, Robert Burrell Donkin <[EMAIL PROTECTED]> wrote:
>
> the license database is essentially interested in facts about concrete
> artifacts. for example, about 'apache-foo-1.0.1.jar'. the only
> reliable way to recognise an artifact is not by it's name but by a
> crytographic hash - for example MD5 aabbcc (yes, i know that MD5 has
> been cracked).


That's going to be expensive. I don't know how big the central maven repo
is, even reduced to only the artifacts with license info in their POM, but
I'd say still far too big. Computing checksums (a rolling adler-32 might
prove a bit cheaper) and loading all those files is going to consume a lot
of I/Os and CPU cycles and that's assuming local computation. Downloading
the whole thing would probably take several days.

Wouldn't a check based both on name and file size be enough? We're not
really concerned about security here and I'd think that name/size
verification would give a fairly low chance of collision.

Matthieu

the hash can be used to find RDF claims about that Artifact. in
> particular, a license URI.
>
> the problem with linking this is with DOAP classes such as Version and
> Project is that Artifact is currently missing. so we can't really the
> classes. we can reuse subjects, though.
>
> - robert
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>

Reply via email to