Hello all,

I am looking for the right thing to read...

I am writing a MapReduce Speech Recognition application. I want to run many Speech Recognizers in parallel.

Speech Recognizers not only use a large amount of processor, they also use a large amount of memory. Also, in my application, they are often idle much of the time waiting for data. So optimizing what runs when is non-trivial.

I am trying to better understand how Hadoop manages resources. Does it automatically figure out the right number of mappers to instantiate? How? What happens when other people are sharing the cluster? What resource management is the responsibility of application developers?

For example, let's say each Speech Recognizer uses 500 MB, and I have 1,000,000 files to process. What would happen if I made 1,000,000 mappers, each with 1 Speech Recognizer? Is it only non-optimal because of setup time, or would the system try to allocate 500GB of memory and explode?

Thank you in advance
Peter

Reply via email to