large memory tasks

Ian Upright Wed, 15 Jun 2011 13:52:25 -0700

Hello, I'm quite new to Hadoop, so I'd like to get an understanding of
something.


Lets say I have a task that requires 16gb of memory, in order to execute.
Lets say hypothetically it's some sort of big lookuptable of sorts that
needs that kind of memory.

I could have 8 cores run the task in parallel (multithreaded), and all 8
cores can share that 16gb lookup table.

On another machine, I could have 4 cores run the same task, and they still
share that same 16gb lookup table.

Now, with my understanding of Hadoop, each task has it's own memory.

So if I have 4 tasks that run on one machine, and 8 tasks on another, then
the 4 tasks need a 64 GB machine, and the 8 tasks need a 128 GB machine, but
really, lets say I only have two machines, one with 4 cores and one with 8,
each machine only having 24 GB.

How can the work be evenly distributed among these machines?  Am I missing
something?  What other ways can this be configured such that this works
properly?

Thanks, Ian

large memory tasks

Reply via email to