RE: breadth-first search

Peng, Wei Tue, 21 Dec 2010 02:00:01 -0800

The graph that my BFS algorithm is running on only needs 4 levels to reach all 
nodes. The reason I say "many iterations" is that there are 100 sources nodes, 
so totally 400 iterations. The algorithm should be right, and I cannot think of 
anything to reduce the number of iterations.


Ted, I will check out the links that you sent to me.
I really appreciate your help.

Wei
-----Original Message-----
From: Edward J. Yoon [mailto:[email protected]] 
Sent: Tuesday, December 21, 2010 1:27 AM
To: [email protected]
Subject: Re: breadth-first search

There's no release yet.

But, I had tested the BFS using hama and, hbase. 

Sent from my iPhone

On 2010. 12. 21., at 오전 11:30, "Peng, Wei" <[email protected]> wrote:

> Yoon,
> 
> Can I use HAMA now, or it is still in development?
> 
> Thanks
> 
> Wei
> 
> -----Original Message-----
> From: Edward J. Yoon [mailto:[email protected]] 
> Sent: Monday, December 20, 2010 6:23 PM
> To: [email protected]
> Subject: Re: breadth-first search
> 
> Check this slide out -
> http://people.apache.org/~edwardyoon/papers/Apache_HAMA_BSP.pdf
> 
> On Tue, Dec 21, 2010 at 10:49 AM, Peng, Wei <[email protected]> wrote:
>> 
>>  I implemented an algorithm to run hadoop on a 25GB graph data to
>> calculate its average separation length.
>> The input format is V1(tab)V2 (where V2 is the friend of V1).
>> My purpose is to first randomly select some seed nodes, and then for
>> each node, calculate the shortest paths from this node to all other
>> nodes on the graph.
>> 
>> To do this, I first run a simple python code in a single machine to get
>> some random seed nodes.
>> Then I run a hadoop job to generate adjacent list for each node as the
>> input for the second job.
>> 
>> The second job takes the adjacent list input and output the first level
>> breadth-first search result. The nodes which are the friends of the seed
>> node have distance 1. Then this output is the input for the next hadoop
>> job so on so forth, until all the nodes are reached.
>> 
>> I generated a simulated graph for testing. This data has only 100 nodes.
>> Normal python code can find the separation length within 1 second (100
>> seed nodes). However, the hadoop took almost 3 hours to do that
>> (pseudo-distributed mode on one machine)!!
>> 
>> I wonder if there is a more efficient way to do breadth-first search in
>> hadoop? It is very inefficient to output so many intermediate results.
>> Totally there would be seedNodeNumber*levelNumber+1 jobs,
>> seedNodeNumber*levelNumber intermediate files. Why is hadoop so slow?
>> 
>> Please help.  Thanks!
>> 
>> Wei
>> 
> 
> 
> 
> -- 
> Best Regards, Edward J. Yoon
> [email protected]
> http://blog.udanax.org

RE: breadth-first search

Reply via email to