[
https://issues.apache.org/jira/browse/HAMA-359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13014469#comment-13014469
]
Thomas Jungblut commented on HAMA-359:
--------------------------------------
Thanks :)
Yesterday I setup the HBase and I decided to go a different way in the table
layout.
The adjacency list will just contain the row numbers and the weights.
And you have a "lookup-table" that maps the vertex name, in my case the city
name to the rownumber.
adjacency_list => rowId : rowId of the out-vertex and its weight
vertex_lookup => vertex : rowId
In the calculation of the shortest paths you don't need the name, so this won't
blow up the table too much.
But if you are really looking for the shortest way from London to Paris you
have to translate the city name to the rowid in order to start the shortest
paths.
I guess this is a much better approach.
The first naive implementation of dijkstra would be:
You partition the row's to available grooms as boundaries (such as LIMIT's in a
database, e.G from row 0 to row 10k). I know that if you have an adjecency
list, the distribution is not perfectly balanced, I will provide a more
advanced algorithm later.
Then you pass the grooms a message which tells them what is the current
start-vertex.
Every groom is now searching for the lowest-weighted adjacent vertex and passes
this message to a "master-groom" like in the PI Estimator.
After that you are syncing, the master will detect the globally clostest vertex
and broadcasts this as the new start-vertex to the grooms.
Of course you have to actually repartition the datasets, because you can
strike-through the row of the last start vertex, which will reduce the "height"
of the list.
Then this steps will go over and over again until you have reached the breaking
condition.
Note that this is just a few thoughts, there are some questions left:
How to store the already visited vertices?
How to store the direct ancestor of a vertex to reconstruct the shortest paths?
Or in the slides this is called the distance vector D.
Is this optimal in the case of a sync step for every vertex in the graph?
etc.
So I will have a few things left for the actual summer of code ;)
Thank you guys for your support.
> Development of Shortest Path Finding Algorithm
> ----------------------------------------------
>
> Key: HAMA-359
> URL: https://issues.apache.org/jira/browse/HAMA-359
> Project: Hama
> Issue Type: New Feature
> Components: examples
> Affects Versions: 0.2.0
> Reporter: Edward J. Yoon
> Labels: gsoc, gsoc2011, mentor
> Fix For: 0.3.0
>
> Original Estimate: 2016h
> Remaining Estimate: 2016h
>
> The goal of this project is development of parallel algorithm for finding a
> Shortest Path using Hama BSP.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira