[
https://issues.apache.org/jira/browse/MAHOUT-742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13884332#comment-13884332
]
Nilesh Chakraborty edited comment on MAHOUT-742 at 1/28/14 5:08 PM:
--------------------------------------------------------------------
Hey [~ssc]!
I've been working on an implementation of large-scale blockwise matrix-vector
multiplication using a single MapReduce job. The current algorithms and
implementations need two MapReduce jobs for blockwise multiplication (or any
sparse mat-vec mult where the vector isn't stored in memory). I will be using
it to implement PageRank. I'll benchmark my implementation against the
state-of-the-art in MapReduce-based PageRank - Pegasus (they've contributed the
Pegasus code to https://github.com/intel-hadoop/HiBench).
If my version turns out to be faster, I'll be writing the code for algorithms
like SVD and Lanczos algorithm (http://en.wikipedia.org/wiki/Lanczos_algorithm)
too.
Do you think these can make for a useful contribution to Mahout? I need to keep
that in mind before I go forward with coding.
was (Author: nileshc):
Hey [~ssc], I've been working on an implementation of large-scale blockwise
matrix-vector multiplication using a single MapReduce job. The current
algorithms and implementations need two MapReduce jobs for blockwise
multiplication (or any sparse mat-vec mult where the vector isn't stored in
memory). I will be using it to implement PageRank. I'll benchmark my
implementation against the state-of-the-art in MapReduce-based PageRank -
Pegasus (they've contributed the Pegasus code to
https://github.com/intel-hadoop/HiBench).
If my version turns out to be faster, I'll be writing the code for algorithms
like SVD and Lanczos algorithm (http://en.wikipedia.org/wiki/Lanczos_algorithm)
too.
Do you think these can make for a useful contribution to Mahout? I need to keep
that in mind before I go forward with coding.
> Pagerank implementation in Map/Reduce
> -------------------------------------
>
> Key: MAHOUT-742
> URL: https://issues.apache.org/jira/browse/MAHOUT-742
> Project: Mahout
> Issue Type: New Feature
> Components: Graph
> Affects Versions: 0.6
> Reporter: Christoph Nagel
> Assignee: Sebastian Schelter
> Fix For: 0.6
>
> Attachments: MAHOUT-742.patch
>
>
> Hi,
> my name is Christoph Nagel. I'm student on technical university Berlin and
> participating on the course of Isabel Drost and Sebastian Schelter.
> My work is to implement the pagerank-algorithm, where the pagerank-vector
> fits in memory.
> For the computation I used the naive algorithm shown in the book 'Mining of
> Massive Datasets' from Rajaraman & Ullman
> (http://www-scf.usc.edu/~csci572/2012Spring/UllmanMiningMassiveDataSets.pdf).
> Matrix- and vector-multiplication are done with mahout methods.
> Most work is the transformation the input graph, which has to consists of a
> nodes- and edges file.
> Format of nodes file: <node>\n
> Format of edges file: <startNode>\t<endNode>\n
> Therefore I created the following classes:
> * LineIndexer: assigns each line an index
> * EdgesToIndex: indexes the nodes of the edges
> * EdgesIndexToTransitionMatrix: creates the transition matrix
> * Pagerank: computes PR from transition matrix
> * JoinNodesWithPagerank: creates the joined output
> * PagerankExampleJob: does the complete job
> Each class has a test (not PagerankExampleJob) and I took the example of the
> book for evaluating.
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)