[ 
https://issues.apache.org/jira/browse/HAMA-743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13598809#comment-13598809
 ] 

Edward J. Yoon commented on HAMA-743:
-------------------------------------

First of all, you'll need to understand the BigTable's data model. Each cell is 
stored in 3D (row,column,timestamp) cube space. Rows are in alphabetical order.

If you feel ready, try to create webtable described in Google's BigTable (Row 
key is URL and Column families are anchor, contents, charset, .., etc). Please 
ignore timestamp dimension to avoid complexity. Then, you'll realized that (Row 
and 'Anchor' column family = inlink by outlink sparse matrix).

The next step is a PageRank calculation. Read Google's Pregel paper and see 
Hama implementation.

References:

 - 
http://svn.apache.org/repos/asf/accumulo/contrib/bsp/trunk/src/main/java/org/apache/accumulo/bsp/
 - 
http://svn.apache.org/repos/asf/hama/trunk/examples/src/main/java/org/apache/hama/examples/PageRank.java
                
> GSoC 2013, Accumulo/HBase's webtable and Hama's PageRank
> --------------------------------------------------------
>
>                 Key: HAMA-743
>                 URL: https://issues.apache.org/jira/browse/HAMA-743
>             Project: Hama
>          Issue Type: New Feature
>            Reporter: Edward J. Yoon
>              Labels: gsoc, gsoc2013, mentor
>
> You'll learns and experiments about the Google's bigtable and pregel by using 
> Apache Accumulo and Hama.
> Implementation issues are inputformatter and partitioner for extracting the 
> 2D matrix from the webtable and partitioning splits by key range.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to