I've spent the last week getting intimate with the CPAN metadata, or in many
cases, the lack thereof, and want to share what I've learned.

First, some background....   We're trying to implement a script called
"cpan2efs", which will essentially do what cpanp/cpanm do for installing a
single module, and then some.  Primarily, map one or more module names to
distributions, determine their dependencies, and install the missing
dependencies, before then installing the requested module(s).  Sounds
simple, right?

For modules that have META.yml files, this is usually pretty straight
forward, provided the META.yml contents are correct.   However, there are
currently 20831 active (meaning: listed in 02packages.details.txt.gz)
distributions, and of those about 75% have META.yml files.   The problem is
how to deal with the other 25%.

Modules like CPAN::FindDependencies, which got the initial cpan2efs
implementation working, figure out the dependencies by running the
Makefile.PL, and parsing the output, or by directly parsing the Makefile.PL.
 This is problematic, because a lot of modules have obnoxiously interactive
Makefile.PL files, and they don't take reasonable default.  In many cases,
they go into infinite loops when you don't answer the questions they ask.
 There's a very long list of special cases to deal with here, if one was to
attempt to automate this.   CPAN::FIndDependencies, however, compresses all
the dependencies into a single list, while we really need them separated
into runtime, build and test dependencies.

Tools like cpanp/cpanm can easily handle this by parsing the Makefile.PL
output, and then recursively installing those modules, but that only works
because everything is being installed in the same target directory.  In our
case, we need to get each distribution installed into it's own EFS release,
so that approach doesn't work so well.

I've spent the last day trying to figure out if we can solve the metadata
problem on the CPAN side.  We have created a CPAN::Mini on
madefsd01:/home/minicpan/latest, and I have been trying to create a complete
cache of the META.yml files, so that our installation tools don't have to do
this dynamically.   This has had very mixed results so far.   First, I
simple retrieved any existing META.yml files over the web from
search.cpan.org, of they were found.  Then I started processing the
Makefile.PLs, and that's where things just fall apart.   Before you even get
through the authors starting with A, you encounter 5 or 6 modules that can't
be processed automatically.     My strategy was to try to do:

   perl Makefile.PL && make metafile

or the Build.PL equivalent, and then copy the resulting META.yml into my
cache.   With almost 5000 missing META.yml files, this is going to be time
consuming.

What I'm still trying to do is create a script that can run after we've
updated a minicpan repo, that incrementally processes new archives, and
extracts the META.yml file, if included, and if not, attempts to generate
one.  I think this approach can work, and then we can code cpan2efs to use
META.yml exclusively.

When a META.yml file is missing, then the module will require manual
intervention to install the first time.  Once you have it installed, then
you will have a working efsdeploy.conf (and possible some hooks)m and future
upgrades shoul be very easy.

I have also just started to look at things like CPANDB, and the plethora of
CPAN-related modules, and it looks like everyone's fighting with the slowly
evolving state of CPAN metadata.   Once we have this basic setup working, it
might be worth the time in looking deeper into the cpantesters code, too.
_______________________________________________
EFS-dev mailing list
[email protected]
http://mailman.openefs.org/mailman/listinfo/efs-dev

Reply via email to