Re: More documentation on Hama- BSP

Felix Halim Wed, 03 Feb 2010 06:08:58 -0800

Of course the second problem is how to partition the graph so that the
sync is minimized...
Because as far as I know, when the data has been partitioned to some machine,
it never moves from those machine (other machines get them through
sync), am I right?.
Or maybe the data may be moved among machines if it leads to lesser
number of sync?


Anyway, I'm currently researching on running max-flow algorithm on top
of MapReduce. So I was looking whether BSP is a good add on. It turns
out that Hama's BSP is quite different from MapReduce. Not exactly in
form of map() and reduce() function anymore...

I was thinking BSP as an add on to MapReduce: In current MR, the map
function is supplied one <key,value> at a time. If the map() function
is modified so that it takes in a RANGE of <key,value> pairs of
certain range, then in some sense it becomes BSP (as explained in my
comment here: 
http://blog.udanax.org/2009/02/breadth-first-search-mapreduce.html#comments).
 However, Hama's BSP seemed to take a different approach...

Felix Halim

On Wed, Feb 3, 2010 at 7:08 PM, Hyunsik Choi <[email protected]> wrote:
> Hi Felix Halim,
>
> Such terrible global sync problem may only occurs in a few of problems
> that have global states. We are trying to research better ways.
> However, many problems (e.g., finding isomorphic subgraphs, computing
> clustering coefficient, and finding cohesive groups) that have global
> states don't have such global sync problems. The graph package of Hama
> is appropriate to these problems.
>
> We will think about PathOthersHaveFollowed(). Thank you for your
> advice! If you are interested in this project, why don't you
> participate in our project?
>
> Best regards,
> --
> Hyunsik Choi
> Database & Information Systems Group, Korea Univ.
> http://diveintodata.org
>
>
> On Wed, Feb 3, 2010 at 5:54 PM, Felix Halim <[email protected]> wrote:
>>
>> On Wed, Feb 3, 2010 at 4:41 PM, Edward J. Yoon <[email protected]> wrote:
>> >>> I think the needToVisit() function might as well need to communicate
>> >>> with other machine:
>> >
>> > Hmm, You're exactly right. In that example, needToVisit() function
>> > checks the IsVisited from some shared-space (e.g., HBase or DBMS, ...,
>> > etc). We wrote with intent to simplify it.
>>
>> From the pseudocode, I see that every Vertex will request IsVisited.
>> In a large graph, the HBase or DBMS will be overwhelmed by many tiny
>> requests from each Vertex.
>> Does the needToVisit() has "bulk query" that aggregates the tiny
>> requests into a single request?
>>
>>
>> > Integer is the distance at "Map<Vertex, Integer> input, Map<Vertex,
>> > Integer> nextQueue", but it could be replaced as other object. for
>> > example, new PathWeHaveFollowed(). Then, perhaps we need not some
>> > shared-space.
>>
>> But we also need to know PathOthersHaveFollowed() don't we?
>> That's why we need HBase to store the global "visited" states of each node.
>>
>> Felix Halim
>

Re: More documentation on Hama- BSP

Reply via email to