[ 
https://issues.apache.org/jira/browse/HAMA-359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13011153#comment-13011153
 ] 

Thomas Jungblut commented on HAMA-359:
--------------------------------------

Hehe, I should've looked at the age of the files :o)

The advantage of using HBase is the following:
HBase is built on top of HDFS, so it is controlling the distribution of the 
data.
BUT this could be a large disadvantage too, in the case if a groom is not 
running on a server where the data is actually stored.
In MapReduce this is called a non-local task, so you have to copy the data to 
the local datanode. 

Maybe Edward can tell us why he decided to get rid of HBase and MapReduce at 
all.

Using a GraphDB and an interface like Blueprints is like using a MySQL Database 
with JDBC inside a distributed environment. It is possible, but IMHO it is not 
optimal.

My scenario would be:

- you have vertices represented as edges inside of a SequenceFile in this 
style: inVertex;outVertex;weight; and so on... (quite similar to graphbase)
-- the user could then simply setup a MapReduce job that connects with 
Blueprints and write it into the SequenceFile.
- we build an index on top of this, let this simply be a hashmap with a vertex 
and its byte offset in the sequence file
Graphbase could make this work for us.

- then we focus on some basic Dijkstra, so the weights are positive, no matter 
if the graph is directed or not.
-- we see how we can distribute this over many machines
-- my first approach would be a master server that holds a shortest-path-matrix 
and let the slaves calculate the distance between the vertices and after that 
communicate with the master, it simply sums them up and updates the matrix.
- for negative weights and very large datasets we could implement a A* heuristic

This is not really optimal, I know that. But this would be my first approach on 
this topic. 
So what are your thoughts on that?

> Development of Shortest Path Finding Algorithm
> ----------------------------------------------
>
>                 Key: HAMA-359
>                 URL: https://issues.apache.org/jira/browse/HAMA-359
>             Project: Hama
>          Issue Type: New Feature
>          Components: examples
>    Affects Versions: 0.2.0
>            Reporter: Edward J. Yoon
>              Labels: gsoc, gsoc2011, mentor
>             Fix For: 0.3.0
>
>   Original Estimate: 2016h
>  Remaining Estimate: 2016h
>
> The goal of this project is development of parallel algorithm for finding a 
> Shortest Path using Hama BSP.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to