Was reading up a bit today on configuring the settings for # task slots, namely:

mapred.tasktracker.map.tasks.maximum
mapred.tasktracker.reduce.tasks.maximum

Was just wondering: couldn't (shouldn't?) this be done dynamically by default? i.e., couldn't/shouldn't a slave node be able to compute these values programmatically based on the # of cores in the machine? (Perhaps in conjunction with a mappers-to-reducers ratio, and a % over-subscribed ratio.)

Obviously there'd be times where you'd want to manually override that, but I'd think there could be a simple algorithm for computing this (e.g., based on the info in slide #8 of this presentation: http://www.slideshare.net/ydn/hadoop-summit-2010-tuning-hadoop-to-deliver-performance-to-your-application) that would cover most users' main use case.

Thoughts? Is there something I'm overlooking here that would make this unworkable?

Thanks,

DR

Reply via email to