On Sun, Apr 05, 2009 at 12:07:03AM -0400, David Golden wrote:

> * The Metabase for CT 2.0 is organized around
> AUTHOR/DISTNAME-VERSION.ARCHIVE (i.e. author + tarball) as the only
> real unique ID on CPAN.
> 
> That's easy enough to fix going forward, but it makes importing
> history difficult -- and it even makes testing the Metabase difficult
> as I have to shave yaks in CPAN::Reporter and Test::Reporter to pass
> the full author/tarball path
> 
> My thought: get a full list of all tarballs on backpan create a
> mapping table -- hopefully, there are few cases of duplicate
> distname-version.

In fact I don't believe there are any.  I certainly didn't notice mysql
scream about duplicate primary keys as I imported them into my database
for the CPobsoleteAN.  And don't forget zip files.

> Q1: does that exist or could it be produced easily?
> Q2: any thoughts on how that could be either kept up to date or
> web-queryable for ongoing mapping of "version 1" reports as they are
> produced?

Importing the metadata for the current backpan is time-consuming but
simple.  Keeping it up to date is simply a case of running the same
script over the backpan every $time_period and ignoring anything that's
already in the database.

Not sure how much of my code will be relevant, but ...

http://www.cantrell.org.uk/cgit/cgit.cgi/cpxxxan/

-- 
David Cantrell | http://www.cantrell.org.uk/david

    Erudite is when you make a classical allusion to a
    feather.  Kinky is when you use the whole chicken.

Reply via email to