Dear Wiki user, You have subscribed to a wiki page or wiki category on "Hama Wiki" for change notification.
The "PageRank" page has been changed by thomasjungblut: http://wiki.apache.org/hama/PageRank?action=diff&rev1=9&rev2=10 * Uses the PageRank algorithm described in the Google Pregel paper * Introduces partitioning and collective communication - * Lets the user submit his/her own TextFile to calculate the sites' Pagerank! == Usage == {{{ - bin/hama jar ../hama-0.4.0-examples.jar pagerank <input path> <output path> [damping factor] [epsilon error] [tasks] + bin/hama jar ../hama-0.x.0-examples.jar pagerank <input path> <output path> [damping factor] [epsilon error] [tasks] }}} The default parameters for pagerank are: @@ -39, +38 @@ Make sure that every site's outlink can somewhere be found in the file as a key-site. Otherwise it will result in weird NullPointerExceptions. - Now you need to transform the text file using: - {{{ - bin/hama jar ../hama-0.4.0-examples.jar pagerank-text2seq /tmp/input.txt /tmp/out/ - }}} - Then you can run pagerank on it with: {{{ - bin/hama jar ../hama-0.4.0-examples.jar pagerank /tmp/out /tmp/pagerank-output + bin/hama jar ../hama-0.x.0-examples.jar pagerank /tmp/input/input.txt /tmp/pagerank-output }}} Note that based on what you have configured, the paths may be in HDFS or on local disk. @@ -59, +53 @@ All pages' rank should sum up to 1.0, otherwise the algorithm is broken. - == Sample Adjacencylist File == - - You can create a large pagerank input file by using the PagerankTeragen file from here: http://code.google.com/p/hama-shortest-paths/source/browse/trunk/hama-gsoc/src/de/jungblut/hama/util/PagerankTeragen.java - - It is based on MapReduce and requires a running Hadoop cluster. You can create a file using - - {{{ - hadoop/bin hadoop -jar <jar containing the pagerank teragen> <number of vertices> <number of reducers / output files> <number of edges per vertex> <output path> - }}} - - Have fun! If you are facing problems, feel free to ask questions on the official mailing list. - - == Implementation == For detailed questions in terms of implementation have a look at my blog. - It describes the algorithm and focuses on the main ideas showing implementation things. + It describes the algorithm and focuses on the main ideas showing implementation things. + It contains ancient code from before Hama 0.5 where we introduced the graph API. http://codingwiththomas.blogspot.com/2011/04/pagerank-with-apache-hama.html
