Re: In memory Map Reduce

Martin Jaggi Sun, 08 Jun 2008 13:06:17 -0700

Is there some statistics available to monitor which percentage of thepairs remains in memory, and which percentage was written to disk?


Or which are these exceptional cases that you mention?

Hadoop goes to some lengths to make sure that things can stay inmemory as
much as possible.  There are still cases, however, where intermediate
results are normally written to disk. That means that implementorswillhave those time scales in their head as they do things which willinevitablymake the trade-offs somewhat poor compared to a system that neverenvisions
intermediate data being written to disk.
But other than guessing like this, I couldn't actually say how itwould turn
out except that for very short jobs, moving jar files around and other
startup costs can be the dominant cost.
On Sun, Jun 1, 2008 at 5:05 AM, Martin Jaggi <[EMAIL PROTECTED]>wrote:
So in the case that all intermediate pairs fit into the RAM of thecluster,does the InMemoryFileSystem already allow the intermediate phase tobe donewithout much disk access? Or what would be the current bottleneckin Hadoop
in this scenario (huge computational load, not so much data in/out)
according to your opinion?

Re: In memory Map Reduce

Reply via email to