Re: [jira] [Commented] (MAHOUT-742) Pagerank implementation in Map/Reduce

Sebastian Schelter Mon, 11 Jul 2011 09:11:06 -0700

Yes, Cloud9 as well as Pegasus http://www.cs.cmu.edu/~pegasus/ have moreadvanced implementations of PageRank.

As both of these libraries are Apache licensed it would also be possibleto use their code for Mahout.




On 11.07.2011 18:04, Dhruv Kumar wrote:

It is a great addition to Mahout.

Jimmy Lin has implemented page rank in the Cloud9 Map Reduce library as
well:

https://github.com/lintool/Cloud9/tree/master/src/dist/edu/umd/cloud9/example/pagerank

On Mon, Jul 11, 2011 at 11:57 AM, Sebastian Schelter (JIRA)<j...@apache.org

wrote:


    [
https://issues.apache.org/jira/browse/MAHOUT-742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13063408#comment-13063408]

Sebastian Schelter commented on MAHOUT-742:
-------------------------------------------

I'd like to note here that this issue only covers the most straight-forward
approach to computing pagerank in Map/Reduce and the code assumes that the
pagerank vector and any row from the transition matrix fit into memory.

There are more efficient ways to compute pagerank as described in
http://portal.acm.org/citation.cfm?id=1830263 and blockwise matrix-vector
mulitplication is necessary if the vectors don't fit into memory anymore.

Nevertheless we should be very happy to have this contribution and I think
it offers a good starting point for further improvements.

Pagerank implementation in Map/Reduce
-------------------------------------

                 Key: MAHOUT-742
                 URL: https://issues.apache.org/jira/browse/MAHOUT-742
             Project: Mahout
          Issue Type: New Feature
          Components: Graph
    Affects Versions: 0.6
            Reporter: Christoph Nagel
            Assignee: Sebastian Schelter
             Fix For: 0.6

         Attachments: MAHOUT-742.patch


Hi,
my name is Christoph Nagel. I'm student on technical university Berlin

and participating on the course of Isabel Drost and Sebastian Schelter.

My work is to implement the pagerank-algorithm, where the pagerank-vector

fits in memory.

For the computation I used the naive algorithm shown in the book 'Mining

of Massive Datasets' from Rajaraman&  Ullman (
http://www-scf.usc.edu/~csci572/2012Spring/UllmanMiningMassiveDataSets.pdf
).

Matrix- and vector-multiplication are done with mahout methods.
Most work is the transformation the input graph, which has to consists of

a nodes- and edges file.

Format of nodes file:<node>\n
Format of edges file:<startNode>\t<endNode>\n
Therefore I created the following classes:
* LineIndexer: assigns each line an index
* EdgesToIndex: indexes the nodes of the edges
* EdgesIndexToTransitionMatrix: creates the transition matrix
* Pagerank: computes PR from transition matrix
* JoinNodesWithPagerank: creates the joined output
* PagerankExampleJob: does the complete job
Each class has a test (not PagerankExampleJob) and I took the example of

the book for evaluating.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Re: [jira] [Commented] (MAHOUT-742) Pagerank implementation in Map/Reduce

Reply via email to