Re: Distributed cache Design

Owen O'Malley Thu, 16 Oct 2008 15:02:27 -0700


On Oct 16, 2008, at 1:52 PM, Bhupesh Bansal wrote:

We at Linkedin are trying to run some Large Graph Analysis problems on
Hadoop. The fastest way to run would be to keep a copy of wholeGraph in RAMat all mappers. (Graph size is about 8G in RAM) we have cluster of 8-cores
machine with 8G on each.

The best way to deal with it is *not* to load the entire graph in oneprocess. In the WebMap at Yahoo, we have a graph of the web that hasroughly 1 trillion links and 100 billion nodes. See http://tinyurl.com/4fgok6. To invert the links, you process the graph in pieces and resortbased on the target. You'll get much better performance and scale toalmost any size.

Whats is the best way of doing that ?? Is there a way so that multiple
mappers on same machine can access a RAM cache ??  I read about hadoop

distributed cache looks like it's copies the file (hdfs / http)locally on

the slaves but not necessrily in RAM ??

You could mmap the file from distributed cache using MappedByteBuffer.Then there will be one copy between jvms...


-- Owen

Re: Distributed cache Design

Reply via email to