On 02/20/04 01:22, David Manura wrote:
Since there has been some discussion recently on improving search.cpan.org search results, here's an initial attempt to apply the Google-inspired PageRank algorithm on Perl modules when interpreting module dependencies as links:

http://www.math2.org/david/perlrank

The top rated modules are provided below:

[SNIP]


This seems like a good approach; I'd be interested in seeing more of this.

I think what would improve the relevency the most is a better data set. The CPAN data set was generated from the 'prereq' keys in the Module::CPANTS::asHash module, and this is only a rough representatation of the linking structure. It would have been better to download all the CPAN modules and do source code analysis directly on them.

I use Randal Schwartz' minicpan script <http://www.stonehenge.com/merlyn/LinuxMag/col42.html> to gather statistics about metadata, to list distributions with nested Makefiles, multiple XS files, etc. (ex. <http://www.thepierianspring.org/meta_stats.pl.html>)


The only problem is that Randal's script sometimes grabs more than just the most recent version of some modules which can skew statistics somewhat.

Randy.

Reply via email to