[Hama Wiki] Update of "PageRank" by edwardyoon

Apache Wiki Wed, 23 Jan 2013 21:40:06 -0800

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hama Wiki" for change 
notification.


The "PageRank" page has been changed by edwardyoon:
http://wiki.apache.org/hama/PageRank?action=diff&rev1=10&rev2=11

+ This document assume that you have already installed Hama cluster and you 
have tested it using some examples.
+ 
  == PageRank ==
  
   * Uses the PageRank algorithm described in the Google Pregel paper
   * Introduces partitioning and collective communication
  
- == Usage ==
+ == Run PageRank on Hama Cluster ==
+ 
+ First of all, generate a symmetric adjacency matrix using the gen command. 
  
  {{{
- bin/hama jar ../hama-0.x.0-examples.jar pagerank <input path> <output path> 
[damping factor] [epsilon error] [tasks]
+   % bin/hama jar hama-examples-0.x.0.jar gen symmetric 100 10 randomgraph 2
  }}}
  
- The default parameters for pagerank are:
+ This will create a graph with 100 nodes and 1K edges and store 2 partitions 
on HDFS as the sequence file. You can adjust partition and tasks numbers to fit 
your cluster. Then, run PageRank using:
  
  {{{
- 0.85 0.001
+   % bin/hama jar hama-examples-0.x.0.jar pagerank randomgraph pagerankresult 4
  }}}
  
- As you can see 0.85 is the damping factor, that is the probability which a 
user will "randomly" jump to other sides. See the 
[[http://en.wikipedia.org/wiki/PageRank#The_intentional_surfer_model|Random 
Surfer Model]].
+ == Submit your own graph ==
  
- 0.001 is the convergence error, the error will always be measured after an 
iteration. It tells how much the pagerank of all sites has changed. If you are 
setting this to a lower factor, it will take more iterations. 
+ See [[WriteHamaGraphFile]]
  
- == Submit your own Web-graph ==
- 
- You can transform your graph as a adjacency list to fit into the input which 
Hama is going to parse and calculate the Pagerank.
- 
- The file that Hama can successfully parse is a TextFile that has the 
following layout:
- 
- {{{
- Site1\tSite2\tSite3
- Site2\tSite3
- Site3
- }}}
- 
- This piece of text will adjacent Site1 to Site2 and Site3, Site2 to Site3 and 
Site3 is a dangling node.
- As you can see a site is always on the leftmost side (we call it the 
key-site), and the outlinks are seperated by tabs (\t) as the following 
elements.
- 
- Make sure that every site's outlink can somewhere be found in the file as a 
key-site. Otherwise it will result in weird NullPointerExceptions.
- 
- Then you can run pagerank on it with:
- 
- {{{
- bin/hama jar ../hama-0.x.0-examples.jar pagerank /tmp/input/input.txt 
/tmp/pagerank-output
- }}}
- 
- Note that based on what you have configured, the paths may be in HDFS or on 
local disk.
- 
- == Output ==
- 
- The output is a double value that is between zero and 1.0. Where 1.0 is a 
very "famous" site.
- 
- All pages' rank should sum up to 1.0, otherwise the algorithm is broken.
- 
- 
- == Implementation ==
- 
- For detailed questions in terms of implementation have a look at my blog.
- It describes the algorithm and focuses on the main ideas showing 
implementation things. 
- It contains ancient code from before Hama 0.5 where we introduced the graph 
API.
- 
- http://codingwiththomas.blogspot.com/2011/04/pagerank-with-apache-hama.html
-

[Hama Wiki] Update of "PageRank" by edwardyoon

Reply via email to