Was reading up a bit today on configuring the settings for # task slots,
namely:
mapred.tasktracker.map.tasks.maximum
mapred.tasktracker.reduce.tasks.maximum
Was just wondering: couldn't (shouldn't?) this be done dynamically by
default? i.e., couldn't/shouldn't a slave node be able to compute these
values programmatically based on the # of cores in the machine?
(Perhaps in conjunction with a mappers-to-reducers ratio, and a %
over-subscribed ratio.)
Obviously there'd be times where you'd want to manually override that,
but I'd think there could be a simple algorithm for computing this
(e.g., based on the info in slide #8 of this presentation:
http://www.slideshare.net/ydn/hadoop-summit-2010-tuning-hadoop-to-deliver-performance-to-your-application)
that would cover most users' main use case.
Thoughts? Is there something I'm overlooking here that would make this
unworkable?
Thanks,
DR