[jira] [Commented] (HAMA-409) Add Pregel-like API

Miklos Erdelyi (JIRA) Mon, 11 Jul 2011 23:18:39 -0700

    [ 
https://issues.apache.org/jira/browse/HAMA-409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13063739#comment-13063739
 ]


Miklos Erdelyi commented on HAMA-409:
-------------------------------------


To get HAMA graph running from the attached archive please follow the below 
steps:
1) Unzip the archive in the root of a freshly checked out HAMA trunk.

2) Get fastutil (http://fastutil.dsi.unimi.it/) and put its jar under 
${basedir}/lib. I needed to put two other dependencies there because I could 
not find them in a public maven repository (dsiutils and webgraph).

3) To run the example inlink counter hama-graph job, invoke:
java org.apache.hama.graph.GraphComputeRunner <path-to-partitioned-graph>/ 
<number-of-partitions> org.apache.hama.graph.examples.InlinkCountComputer 
<max-iterations>

All the dependencies need to be on the classpath including the used HAMA 
distribution's config directory. The partitioned graph needs to be accessible 
through NFS on all the cluster nodes.

A few thoughts on current implementation:

a) Graph input
Currently HipG's (www.cs.vu.nl/~ekr/hipg/) segmented graph loader is used for 
loading partitioned graphs. A conversion util is provided 
(org.apache.hama.graph.format.ConvertFromWebGraphASCII.java) which can convert 
from WebGraph's (http://webgraph.dsi.unimi.it/) ASCII format into the segmented 
graph format.

This mechanism should be replaced with a builder-like approach employed by e.g. 
GoldenOrb: based on some input specification a factory class dependent on the 
input format of the graph should load vertexes into memory.
A good approach would be letting workers load data-local segments of the graph 
and do a partition-worker assignment based on data-locality, after which loaded 
vertexes not belonging to one worker's partition(s) could be sent to the proper 
workers via network.

b) Message buffering
In the current implementation messages are grouped by vertexes after a new 
superstep starts, i.e., VertexComputerGeneric iterates through all messages 
recevied by the BSPPeer and puts them into the queue of respective vertexes.
This memory copying could be avoided by letting the BSP framework do the 
grouping while receiving the messages, i.e., by grouping messages by tag such 
that messages intended for the same vertex ID will have the same tag.

c) Output
Currently nothing is output from the vertexes. However, it would be relatively 
easy to add, e.g., by requiring that the Vertex class implement Writable and 
serializing all vertexes at the end of certain supersteps. This would be a step 
towards fault-tolerance too.


> Add Pregel-like API
> -------------------
>
>                 Key: HAMA-409
>                 URL: https://issues.apache.org/jira/browse/HAMA-409
>             Project: Hama
>          Issue Type: New Feature
>          Components: bsp, examples
>    Affects Versions: 0.3.0
>            Reporter: Thomas Jungblut
>              Labels: graph
>             Fix For: 0.4.0
>
>         Attachments: hama-graph.zip
>
>
> According to what we've discussed on the mailing list, we should add a 
> Pregel-like API.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HAMA-409) Add Pregel-like API

Reply via email to