Hello, I'm quite new to Hadoop, so I'd like to get an understanding of something.
Lets say I have a task that requires 16gb of memory, in order to execute. Lets say hypothetically it's some sort of big lookuptable of sorts that needs that kind of memory. I could have 8 cores run the task in parallel (multithreaded), and all 8 cores can share that 16gb lookup table. On another machine, I could have 4 cores run the same task, and they still share that same 16gb lookup table. Now, with my understanding of Hadoop, each task has it's own memory. So if I have 4 tasks that run on one machine, and 8 tasks on another, then the 4 tasks need a 64 GB machine, and the 8 tasks need a 128 GB machine, but really, lets say I only have two machines, one with 4 cores and one with 8, each machine only having 24 GB. How can the work be evenly distributed among these machines? Am I missing something? What other ways can this be configured such that this works properly? Thanks, Ian
