On Tue, 2006-07-11 at 16:22 +0200, Thomas Klausner wrote: > > One thing that'd be useful to have a small module for would be parsing > > the Changes files and show just the changes in the last version. It'd > > be a bit of work to make (does the file go backwards or forwards? How > > are the versions separated? etc etc), but it should be doable to get it > > to work for 90% of the CPAN distributions. > > Such a module would indeed be great. I was toying with the idea of > writing a small script that compares new cpan uploads with stuff > installed at my machine(s) and reports new dists and what changed. > > But I never got around writing it, as changes-parsing seems quite > futile.
We wouldn't need full fledged parsing, I think. It seems safe to assume that most changelogs are visually blocked and from sifting through a few examples a split using qr{?x \r?\n \s* \r?\n \w} and just qr{?x \r?\n \w} as fallback when this yields no results should separate the version blocks. Scanning these blocks from start and end (using @blocks[1,2,3,-1,4,-2,5, ...]) until a version id is found that matches the newest version should be able to identify the correct changeset. Do you think this approach is feasible and if yes, how can I access a large enough body of changelogs to test and refine it? Oh, by the way, the first version of this filter is available from http://cpan.org/authors/id/W/WI/WILLERT/cpan-changes.pl (the name sucks IMHO), so have a look. Another feature I added yesterday is just the one you suggested: it can now filter out all uninstalled modules. Cheers, Sebastian