Re: [jira] [Commented] (MAHOUT-742) Pagerank implementation in Map/Reduce

Dhruv Kumar Mon, 11 Jul 2011 09:04:36 -0700

It is a great addition to Mahout.

Jimmy Lin has implemented page rank in the Cloud9 Map Reduce library as
well:


https://github.com/lintool/Cloud9/tree/master/src/dist/edu/umd/cloud9/example/pagerank

On Mon, Jul 11, 2011 at 11:57 AM, Sebastian Schelter (JIRA) <j...@apache.org
> wrote:

>
>    [
> https://issues.apache.org/jira/browse/MAHOUT-742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13063408#comment-13063408]
>
> Sebastian Schelter commented on MAHOUT-742:
> -------------------------------------------
>
> I'd like to note here that this issue only covers the most straight-forward
> approach to computing pagerank in Map/Reduce and the code assumes that the
> pagerank vector and any row from the transition matrix fit into memory.
>
> There are more efficient ways to compute pagerank as described in
> http://portal.acm.org/citation.cfm?id=1830263 and blockwise matrix-vector
> mulitplication is necessary if the vectors don't fit into memory anymore.
>
> Nevertheless we should be very happy to have this contribution and I think
> it offers a good starting point for further improvements.
>
> > Pagerank implementation in Map/Reduce
> > -------------------------------------
> >
> >                 Key: MAHOUT-742
> >                 URL: https://issues.apache.org/jira/browse/MAHOUT-742
> >             Project: Mahout
> >          Issue Type: New Feature
> >          Components: Graph
> >    Affects Versions: 0.6
> >            Reporter: Christoph Nagel
> >            Assignee: Sebastian Schelter
> >             Fix For: 0.6
> >
> >         Attachments: MAHOUT-742.patch
> >
> >
> > Hi,
> > my name is Christoph Nagel. I'm student on technical university Berlin
> and participating on the course of Isabel Drost and Sebastian Schelter.
> > My work is to implement the pagerank-algorithm, where the pagerank-vector
> fits in memory.
> > For the computation I used the naive algorithm shown in the book 'Mining
> of Massive Datasets' from Rajaraman & Ullman (
> http://www-scf.usc.edu/~csci572/2012Spring/UllmanMiningMassiveDataSets.pdf
> ).
> > Matrix- and vector-multiplication are done with mahout methods.
> > Most work is the transformation the input graph, which has to consists of
> a nodes- and edges file.
> > Format of nodes file: <node>\n
> > Format of edges file: <startNode>\t<endNode>\n
> > Therefore I created the following classes:
> > * LineIndexer: assigns each line an index
> > * EdgesToIndex: indexes the nodes of the edges
> > * EdgesIndexToTransitionMatrix: creates the transition matrix
> > * Pagerank: computes PR from transition matrix
> > * JoinNodesWithPagerank: creates the joined output
> > * PagerankExampleJob: does the complete job
> > Each class has a test (not PagerankExampleJob) and I took the example of
> the book for evaluating.
>
> --
> This message is automatically generated by JIRA.
> For more information on JIRA, see: http://www.atlassian.com/software/jira
>
>
>

Re: [jira] [Commented] (MAHOUT-742) Pagerank implementation in Map/Reduce

Reply via email to