[jira] [Commented] (HAMA-359) Development of Shortest Path Finding Algorithm

Thomas Jungblut (JIRA) Mon, 04 Apr 2011 08:03:51 -0700

    [ 
https://issues.apache.org/jira/browse/HAMA-359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13015451#comment-13015451
 ]


Thomas Jungblut commented on HAMA-359:
--------------------------------------

For the first naive partitioning idea it is necessary to call this method:
http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html#getEndKeys%28%29

This will return your maximum row key, in this case these are positive ID's, so 
the range is going from zero to the max (return of the method).
Afterwards you are distributing the ranges across the available grooms.

For an adjacency matrix this is going to be a quite good partitioning: you 
always have the same row length.
In my case this is a list, so you have to watch out for edge hotspots (in my 
example there is a fixed number of edges, so this won't be critical at all).

*The longer running partioning:*
My advanced partitioning idea is that you'll setup a HTable scanner job, and 
you pass the number of grooms to the job via configuration. 
Now the job (Map-Phase) determines how many row bytes for a specific key chunk 
(say 10k vertices, or at least based on the number of grooms) are currently 
inside this chunk.
So you'll be able in the reduce phase to determine which chunk has a hotspot, 
or if it is already evenly distributed. In the case of evenly distribution you 
know that you can use the naive partitioning. 
But if it is not even, you'll have to chunk the hotspot(s) itself (say divide 
by 2). Write out the "chunked-hotspot" boundaries (or put it into a conf) and 
let the algorithm run again until it is better distributed.

So this is going to take a while since the real-partitioning is NP-complete. I 
read about Kernighan–Lin algorithm which is a heuristic: 
http://en.wikipedia.org/wiki/Kernighan%E2%80%93Lin_algorithm
This is actually an edge-weight partitioning, we should replace the edge weight 
with the size of the row containing this edge.

Maybe you have a better (or at least faster idea?)

The next problem is the repartitioning:
So if you are using Dijkstra you can simply remove the already visited 
lowest-weighted vertex. So this row is going to be overhead for the grooms in 
the next step. If you are unlucky, every adjacency row in Groom1 will be 
dismissed and Groom2 has to do all the work on its own.
So to avoid this, you have to keep track of the boundaries that are passed to 
the grooms before a new step.

A big advantage of Google Pregel is the caching (didn't actually read the 
paper, but I read it somewhere else) so every "Pregel-Groom" has a real big 
amount of memory to cache the chunk he is always working on.
So we should not repartition too much, but this is currently just a thought so 
skip that caching stuff.

This is a cool page too:
http://www.eecs.berkeley.edu/~demmel/cs267/lecture18/lecture18.html#link_4.2

> Development of Shortest Path Finding Algorithm
> ----------------------------------------------
>
>                 Key: HAMA-359
>                 URL: https://issues.apache.org/jira/browse/HAMA-359
>             Project: Hama
>          Issue Type: New Feature
>          Components: examples
>    Affects Versions: 0.2.0
>            Reporter: Edward J. Yoon
>            Assignee: Edward J. Yoon
>              Labels: gsoc, gsoc2011, mentor
>             Fix For: 0.3.0
>
>   Original Estimate: 2016h
>  Remaining Estimate: 2016h
>
> The goal of this project is development of parallel algorithm for finding a 
> Shortest Path using Hama BSP.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HAMA-359) Development of Shortest Path Finding Algorithm

Reply via email to