On Mon, Apr 6, 2009 at 4:49 AM, Barbie <[email protected]> wrote:
> The uniquness of a distribution is the pauseid, the distribution name,
> the distribution version *and* the archive extension.
+1
I'm glad someone is thinking clearly. The path prefix *is* a PAUSE ID
-- regardless of whatever meaning we choose to give to it.
>The latter is
> important as there have been .zip and .tar.gz uploads that match the
> other 3. However, they are still the same distribution release and
Now I'll be pedantic. They *should* be the same distribution, but
there is no guarantee that they are.
> currently reports on either are attributed to the distribution version.
> I only mention it in that if you are going to store the path, then there
> may be more than one. Likewise there have been uploads that have added
> an extra sub-directory into the path, although I don't think that any
> have done it with the same release.
Fortunately, we're not really focusing on "storing" the association --
we're just trying to make sure that we're clear what "resource" we're
talking about with relation to a CPAN Testers report.
Here's my summary of the data model:
A "distfile" is a unique path that identifies a file in the "authors"
directory of CPAN (ignoring the "X/XY/" hashing). The first component
of the path is PAUSE ID. The last component of the path is an archive
file. (I'm ignoring *.pm files for the moment). Thus, given an
arbitrary distfile with unix-separators (and without a leading slash):
@path = split qr{/}, $distfile;
$pause_id = $path[0];
$archive_file = $path[-1];
Archive files should be able to split into distribution "name",
"version" and an archive suffix.
After a chat with rjbs last night, here's my proposal for CT 2.0 for
the time being. CPAN Testers reports in a metabase should use
*either* a distfile resource or a distribution resource:
cpan:///distfile/pauseid/distname-version.suffix
cpan:///dist/distname-version
The latter is what CT 1.0 reports are indexed by. For the time being,
we should allow those, and sometime in the future we can use a backpan
index to make a best guess at the corresponding distfile. New CT 2.0
clients must (in the RFC sense) submit reports for distfiles, not
dists, so that we have reports associated with the unique resource on
CPAN.
-- David