Hi folks,
Sorry to cut across this discussion but I'm experiencing some similar
confusion about where to change some parameters.
In particular, I'm not entirely clear on how the following should be
used - clarification welcome (I'm happy to pull some of this together on
a blog once I get some clarity).
In hadoop/conf/hadoop-site.xml
hadoop.tmp.dir - when submitting a job from a client (not one of the
hadoop cluster machines), does this specify a directory local to the
client in which hadoop creates temporary files or is it a directory that
on each hadoop machine on which the job runs? I notice that the cloudera
configurator specifies this as /tmp/hadoop-${user.name} - this seems
like a nice approach to use, is it safe for this tmp.dir to be blown
away when a machine is rebooted?
mapred.child.java.opts (-Xmx) and mapred.child.ulimit
presumably these should be set totally differently on the namenode, data
nodes and client machine (assuming they are different?). In the case of
the namenode and data nodes, I assume they should be set quite large. In
the case of the client, should they be set so that the number of tasks *
allocated memory is roughly equal to the amount of memory free on each
data node?
mapred.map.tasks and mapred.reduce.tasks
My understanding on the namenode and data nodes is that these should be
set to less than the number of cores or less. Is that correct? For the
client, should these be bumped closer to the total number of cores that
are available in the overall cluster?
mapred.tasktracker.tasks.maximum
Does this work as a cap on mapred.map.tasks and mapred.reduce.tasks? Is
it neccesary to use this as well as mapred.map.tasks and
mapred.reduce.tasks?
Finally, in hadoop/conf/hadoop-env.sh
export HADOOP_HEAPSIZE=xxxx
Should this be changed normally? If so, how large should it normally be?
50% of total system memory?
Thanks for any input,
-stephen
--
Stephen Mulcahy, DI2, Digital Enterprise Research Institute,
NUI Galway, IDA Business Park, Lower Dangan, Galway, Ireland
http://di2.deri.ie http://webstar.deri.ie http://sindice.com