Do you think this approach is feasible and if yes, how can I access a
large enough body of changelogs to test and refine it?
http://search.cpan.org/~adamk/CPAN-Mini-Extract-0.13/lib/CPAN/Mini/Extract.pm
# Create a CPAN extractor
my $cpan = CPAN::Mini::Extract->new(
remote => 'http://mirrors.kernel.org/cpan/',
local => '/home/adam/.minicpan',
trace => 1,
extract => '/home/adam/.cpanextracted',
extract_filter => sub { /\.pm$/ and ! /\b(inc|t)\b/ },
extract_check => 1,
);
# Run the minicpan process
$cpan->run;
Why not get all of them.
You could just tweak that extract_filter line to only extract the
Change/ChangeLog etc files from the CPAN archives, and it should grab a
minicpan and extract ALL the changelog files.
Unfortunately, it doesn't have a YAML config file-based front-end yet,
but now that YAML::Tiny mostly works I'll do that eventually.
Adam K