[
https://issues.apache.org/jira/browse/HAMA-359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13011153#comment-13011153
]
Thomas Jungblut commented on HAMA-359:
--------------------------------------
Hehe, I should've looked at the age of the files :o)
The advantage of using HBase is the following:
HBase is built on top of HDFS, so it is controlling the distribution of the
data.
BUT this could be a large disadvantage too, in the case if a groom is not
running on a server where the data is actually stored.
In MapReduce this is called a non-local task, so you have to copy the data to
the local datanode.
Maybe Edward can tell us why he decided to get rid of HBase and MapReduce at
all.
Using a GraphDB and an interface like Blueprints is like using a MySQL Database
with JDBC inside a distributed environment. It is possible, but IMHO it is not
optimal.
My scenario would be:
- you have vertices represented as edges inside of a SequenceFile in this
style: inVertex;outVertex;weight; and so on... (quite similar to graphbase)
-- the user could then simply setup a MapReduce job that connects with
Blueprints and write it into the SequenceFile.
- we build an index on top of this, let this simply be a hashmap with a vertex
and its byte offset in the sequence file
Graphbase could make this work for us.
- then we focus on some basic Dijkstra, so the weights are positive, no matter
if the graph is directed or not.
-- we see how we can distribute this over many machines
-- my first approach would be a master server that holds a shortest-path-matrix
and let the slaves calculate the distance between the vertices and after that
communicate with the master, it simply sums them up and updates the matrix.
- for negative weights and very large datasets we could implement a A* heuristic
This is not really optimal, I know that. But this would be my first approach on
this topic.
So what are your thoughts on that?
> Development of Shortest Path Finding Algorithm
> ----------------------------------------------
>
> Key: HAMA-359
> URL: https://issues.apache.org/jira/browse/HAMA-359
> Project: Hama
> Issue Type: New Feature
> Components: examples
> Affects Versions: 0.2.0
> Reporter: Edward J. Yoon
> Labels: gsoc, gsoc2011, mentor
> Fix For: 0.3.0
>
> Original Estimate: 2016h
> Remaining Estimate: 2016h
>
> The goal of this project is development of parallel algorithm for finding a
> Shortest Path using Hama BSP.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira