[ 
https://issues.apache.org/jira/browse/HAMA-395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13048708#comment-13048708
 ] 

Thomas Jungblut edited comment on HAMA-395 at 6/13/11 7:48 PM:
---------------------------------------------------------------

great :)

Currently crawling arround 105000 sites with their outlinks. Tomorrow I'm going 
to reduce the dataset to an adjacency list and write a bsp parser for that.
I've decided to use Text.class as key and value, value is a semicolon seperated 
list of hosts. Each element is representing a normalized host like 
stackoverflow.com or google.com.
So a key is the site and the value is a seperated list of outlinks.

Do you need any other input formatting Steve? 
As far as I can see it is parsing a textfile with the default delimiter of 
StringTokenizer where the first element of a line is the page and the follow up 
elements are the outlinks.

Amazon instance crashed- fail. Do we have another large dataset?

      was (Author: thomas.jungblut):
    great :)

Currently crawling arround 105000 sites with their outlinks. Tomorrow I'm going 
to reduce the dataset to an adjacency list and write a bsp parser for that.
I've decided to use Text.class as key and value, value is a semicolon seperated 
list of hosts. Each element is representing a normalized host like 
stackoverflow.com or google.com.
So a key is the site and the value is a seperated list of outlinks.

Do you need any other input formatting Steve? 
As far as I can see it is parsing a textfile with the default delimiter of 
StringTokenizer where the first element of a line is the page and the follow up 
elements are the outlinks.
  
> Example: PageRank
> -----------------
>
>                 Key: HAMA-395
>                 URL: https://issues.apache.org/jira/browse/HAMA-395
>             Project: Hama
>          Issue Type: Improvement
>          Components: bsp, examples
>    Affects Versions: 0.2.0
>            Reporter: Thomas Jungblut
>            Assignee: Thomas Jungblut
>         Attachments: HAMA-395-v1.patch, HAMA-395-v2.patch, HAMA-395-v3.patch, 
> HAMA-395.patch
>
>
> I'd like to contribute my PageRank BSP as an example. 
> http://codingwiththomas.blogspot.com/2011/04/pagerank-with-apache-hama.html
> TODO:
> - refactor the partitioning from the SSSP patch in 
> https://issues.apache.org/jira/browse/HAMA-359 (extract an utility class etc)
> - add a really cool web-sub-graph example dataset ;D
> - add a wiki page for it

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to