On Oct 16, 2008, at 1:52 PM, Bhupesh Bansal wrote:

We at Linkedin are trying to run some Large Graph Analysis problems on
Hadoop. The fastest way to run would be to keep a copy of whole Graph in RAM at all mappers. (Graph size is about 8G in RAM) we have cluster of 8- cores
machine with 8G on each.

The best way to deal with it is *not* to load the entire graph in one process. In the WebMap at Yahoo, we have a graph of the web that has roughly 1 trillion links and 100 billion nodes. See http://tinyurl.com/4fgok6 . To invert the links, you process the graph in pieces and resort based on the target. You'll get much better performance and scale to almost any size.

Whats is the best way of doing that ?? Is there a way so that multiple
mappers on same machine can access a RAM cache ??  I read about hadoop
distributed cache looks like it's copies the file (hdfs / http) locally on
the slaves but not necessrily in RAM ??

You could mmap the file from distributed cache using MappedByteBuffer. Then there will be one copy between jvms...

-- Owen

Reply via email to