[
https://issues.apache.org/jira/browse/HAMA-409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13063739#comment-13063739
]
Miklos Erdelyi commented on HAMA-409:
-------------------------------------
To get HAMA graph running from the attached archive please follow the below
steps:
1) Unzip the archive in the root of a freshly checked out HAMA trunk.
2) Get fastutil (http://fastutil.dsi.unimi.it/) and put its jar under
${basedir}/lib. I needed to put two other dependencies there because I could
not find them in a public maven repository (dsiutils and webgraph).
3) To run the example inlink counter hama-graph job, invoke:
java org.apache.hama.graph.GraphComputeRunner <path-to-partitioned-graph>/
<number-of-partitions> org.apache.hama.graph.examples.InlinkCountComputer
<max-iterations>
All the dependencies need to be on the classpath including the used HAMA
distribution's config directory. The partitioned graph needs to be accessible
through NFS on all the cluster nodes.
A few thoughts on current implementation:
a) Graph input
Currently HipG's (www.cs.vu.nl/~ekr/hipg/) segmented graph loader is used for
loading partitioned graphs. A conversion util is provided
(org.apache.hama.graph.format.ConvertFromWebGraphASCII.java) which can convert
from WebGraph's (http://webgraph.dsi.unimi.it/) ASCII format into the segmented
graph format.
This mechanism should be replaced with a builder-like approach employed by e.g.
GoldenOrb: based on some input specification a factory class dependent on the
input format of the graph should load vertexes into memory.
A good approach would be letting workers load data-local segments of the graph
and do a partition-worker assignment based on data-locality, after which loaded
vertexes not belonging to one worker's partition(s) could be sent to the proper
workers via network.
b) Message buffering
In the current implementation messages are grouped by vertexes after a new
superstep starts, i.e., VertexComputerGeneric iterates through all messages
recevied by the BSPPeer and puts them into the queue of respective vertexes.
This memory copying could be avoided by letting the BSP framework do the
grouping while receiving the messages, i.e., by grouping messages by tag such
that messages intended for the same vertex ID will have the same tag.
c) Output
Currently nothing is output from the vertexes. However, it would be relatively
easy to add, e.g., by requiring that the Vertex class implement Writable and
serializing all vertexes at the end of certain supersteps. This would be a step
towards fault-tolerance too.
> Add Pregel-like API
> -------------------
>
> Key: HAMA-409
> URL: https://issues.apache.org/jira/browse/HAMA-409
> Project: Hama
> Issue Type: New Feature
> Components: bsp, examples
> Affects Versions: 0.3.0
> Reporter: Thomas Jungblut
> Labels: graph
> Fix For: 0.4.0
>
> Attachments: hama-graph.zip
>
>
> According to what we've discussed on the mailing list, we should add a
> Pregel-like API.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira