That's right. So, firstly I'm researching on graph partitioning using
MapReduce and BSP.

The reason why we don't use key-value pairs is that generally graph
algorithms is based on dynamic programming. Thus, we intend to use messages
transmitted from one to others. It is more intuitive for many graph
algorithms. It provides convenient ways to convert traditional graph
algorithms to BSP version algorithms.
--
Hyunsik Choi
Database & Information Systems Group, Korea Univ.
http://diveintodata.org


On Wed, Feb 3, 2010 at 11:08 PM, Felix Halim <[email protected]> wrote:

> Of course the second problem is how to partition the graph so that the
> sync is minimized...
> Because as far as I know, when the data has been partitioned to some
> machine,
> it never moves from those machine (other machines get them through
> sync), am I right?.
> Or maybe the data may be moved among machines if it leads to lesser
> number of sync?
>
> Anyway, I'm currently researching on running max-flow algorithm on top
> of MapReduce. So I was looking whether BSP is a good add on. It turns
> out that Hama's BSP is quite different from MapReduce. Not exactly in
> form of map() and reduce() function anymore...
>
> I was thinking BSP as an add on to MapReduce: In current MR, the map
> function is supplied one <key,value> at a time. If the map() function
> is modified so that it takes in a RANGE of <key,value> pairs of
> certain range, then in some sense it becomes BSP (as explained in my
> comment here:
> http://blog.udanax.org/2009/02/breadth-first-search-mapreduce.html#comments
> ).
>  However, Hama's BSP seemed to take a different approach...
>
> Felix Halim
>
> On Wed, Feb 3, 2010 at 7:08 PM, Hyunsik Choi <[email protected]>
> wrote:
> > Hi Felix Halim,
> >
> > Such terrible global sync problem may only occurs in a few of problems
> > that have global states. We are trying to research better ways.
> > However, many problems (e.g., finding isomorphic subgraphs, computing
> > clustering coefficient, and finding cohesive groups) that have global
> > states don't have such global sync problems. The graph package of Hama
> > is appropriate to these problems.
> >
> > We will think about PathOthersHaveFollowed(). Thank you for your
> > advice! If you are interested in this project, why don't you
> > participate in our project?
> >
> > Best regards,
> > --
> > Hyunsik Choi
> > Database & Information Systems Group, Korea Univ.
> > http://diveintodata.org
> >
> >
> > On Wed, Feb 3, 2010 at 5:54 PM, Felix Halim <[email protected]>
> wrote:
> >>
> >> On Wed, Feb 3, 2010 at 4:41 PM, Edward J. Yoon <[email protected]>
> wrote:
> >> >>> I think the needToVisit() function might as well need to communicate
> >> >>> with other machine:
> >> >
> >> > Hmm, You're exactly right. In that example, needToVisit() function
> >> > checks the IsVisited from some shared-space (e.g., HBase or DBMS, ...,
> >> > etc). We wrote with intent to simplify it.
> >>
> >> From the pseudocode, I see that every Vertex will request IsVisited.
> >> In a large graph, the HBase or DBMS will be overwhelmed by many tiny
> >> requests from each Vertex.
> >> Does the needToVisit() has "bulk query" that aggregates the tiny
> >> requests into a single request?
> >>
> >>
> >> > Integer is the distance at "Map<Vertex, Integer> input, Map<Vertex,
> >> > Integer> nextQueue", but it could be replaced as other object. for
> >> > example, new PathWeHaveFollowed(). Then, perhaps we need not some
> >> > shared-space.
> >>
> >> But we also need to know PathOthersHaveFollowed() don't we?
> >> That's why we need HBase to store the global "visited" states of each
> node.
> >>
> >> Felix Halim
> >
>

Reply via email to