It is a great addition to Mahout. Jimmy Lin has implemented page rank in the Cloud9 Map Reduce library as well:
https://github.com/lintool/Cloud9/tree/master/src/dist/edu/umd/cloud9/example/pagerank On Mon, Jul 11, 2011 at 11:57 AM, Sebastian Schelter (JIRA) <j...@apache.org > wrote: > > [ > https://issues.apache.org/jira/browse/MAHOUT-742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13063408#comment-13063408] > > Sebastian Schelter commented on MAHOUT-742: > ------------------------------------------- > > I'd like to note here that this issue only covers the most straight-forward > approach to computing pagerank in Map/Reduce and the code assumes that the > pagerank vector and any row from the transition matrix fit into memory. > > There are more efficient ways to compute pagerank as described in > http://portal.acm.org/citation.cfm?id=1830263 and blockwise matrix-vector > mulitplication is necessary if the vectors don't fit into memory anymore. > > Nevertheless we should be very happy to have this contribution and I think > it offers a good starting point for further improvements. > > > Pagerank implementation in Map/Reduce > > ------------------------------------- > > > > Key: MAHOUT-742 > > URL: https://issues.apache.org/jira/browse/MAHOUT-742 > > Project: Mahout > > Issue Type: New Feature > > Components: Graph > > Affects Versions: 0.6 > > Reporter: Christoph Nagel > > Assignee: Sebastian Schelter > > Fix For: 0.6 > > > > Attachments: MAHOUT-742.patch > > > > > > Hi, > > my name is Christoph Nagel. I'm student on technical university Berlin > and participating on the course of Isabel Drost and Sebastian Schelter. > > My work is to implement the pagerank-algorithm, where the pagerank-vector > fits in memory. > > For the computation I used the naive algorithm shown in the book 'Mining > of Massive Datasets' from Rajaraman & Ullman ( > http://www-scf.usc.edu/~csci572/2012Spring/UllmanMiningMassiveDataSets.pdf > ). > > Matrix- and vector-multiplication are done with mahout methods. > > Most work is the transformation the input graph, which has to consists of > a nodes- and edges file. > > Format of nodes file: <node>\n > > Format of edges file: <startNode>\t<endNode>\n > > Therefore I created the following classes: > > * LineIndexer: assigns each line an index > > * EdgesToIndex: indexes the nodes of the edges > > * EdgesIndexToTransitionMatrix: creates the transition matrix > > * Pagerank: computes PR from transition matrix > > * JoinNodesWithPagerank: creates the joined output > > * PagerankExampleJob: does the complete job > > Each class has a test (not PagerankExampleJob) and I took the example of > the book for evaluating. > > -- > This message is automatically generated by JIRA. > For more information on JIRA, see: http://www.atlassian.com/software/jira > > >