[ https://issues.apache.org/jira/browse/NUTCH-635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12605142#action_12605142 ]
Dennis Kubes commented on NUTCH-635: ------------------------------------ Andrzej Bialecki wrote: > One more question: you said the algorithm converges, but do you have a > reference set of values from this dataset, calculated using some other > pagerank impl? It would be worthwhile to make sure that the > > values are > indeed the PageRank, as described, and not yet another subtle variation such > as our OPIC I was doing it low tech. By turning on the debug logging, warning it is a large output, and using grep you can see the score converge after a few iterations ;) > There are a few Java packages for computing PageRank, we could adapt one of > those to serve as a baseline: > > http://law.dsi.unimi.it/ > http://webla.sourceforge.net/javadocs/pt/tumba/links/PageRank.html I agree it would be a good comparison. Strictly speaking though it is not just pagerank. There are optimizations for multiple links from a given domain, penalties for very few inlinks, and a minimum score value. All of which are able to be changed through the configuration. Besides that it does follow the original pagerank algorithm closely. > LinkAnalysis Tool for Nutch > --------------------------- > > Key: NUTCH-635 > URL: https://issues.apache.org/jira/browse/NUTCH-635 > Project: Nutch > Issue Type: New Feature > Affects Versions: 1.0.0 > Environment: All > Reporter: Dennis Kubes > Assignee: Dennis Kubes > Fix For: 1.0.0 > > Attachments: NUTCH-635-1-20080612.patch, NUTCH-635-2-20080613.patch, > NUTCH-635-3-20080614.patch, NUTCH-635-4-20080615.patch > > > This is a basic pagerank type link analysis tool for nutch which simulates a > sparse matrix using inlinks and outlinks and converges after a given number > of iterations. This tool is mean to replace the current scoring system in > nutch with a system that converges instead of exponentially increasing > scores. Also includes a tool to create an outlinkdb. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.