Since there has been some discussion recently on improving search.cpan.org search results, here's an initial attempt to apply the Google-inspired PageRank algorithm on Perl modules when interpreting module dependencies as links:
http://www.math2.org/david/perlrank
The top rated modules are provided below:
[SNIP]
This seems like a good approach; I'd be interested in seeing more of this.
I think what would improve the relevency the most is a better data set. The CPAN data set was generated from the 'prereq' keys in the Module::CPANTS::asHash module, and this is only a rough representatation of the linking structure. It would have been better to download all the CPAN modules and do source code analysis directly on them.
I use Randal Schwartz' minicpan script <http://www.stonehenge.com/merlyn/LinuxMag/col42.html> to gather statistics about metadata, to list distributions with nested Makefiles, multiple XS files, etc. (ex. <http://www.thepierianspring.org/meta_stats.pl.html>)
The only problem is that Randal's script sometimes grabs more than just the most recent version of some modules which can skew statistics somewhat.
Randy.
