Hi folks,

Sorry to cut across this discussion but I'm experiencing some similar confusion about where to change some parameters.

In particular, I'm not entirely clear on how the following should be used - clarification welcome (I'm happy to pull some of this together on a blog once I get some clarity).

In hadoop/conf/hadoop-site.xml

hadoop.tmp.dir - when submitting a job from a client (not one of the hadoop cluster machines), does this specify a directory local to the client in which hadoop creates temporary files or is it a directory that on each hadoop machine on which the job runs? I notice that the cloudera configurator specifies this as /tmp/hadoop-${user.name} - this seems like a nice approach to use, is it safe for this tmp.dir to be blown away when a machine is rebooted?

mapred.child.java.opts (-Xmx) and mapred.child.ulimit

presumably these should be set totally differently on the namenode, data nodes and client machine (assuming they are different?). In the case of the namenode and data nodes, I assume they should be set quite large. In the case of the client, should they be set so that the number of tasks * allocated memory is roughly equal to the amount of memory free on each data node?

mapred.map.tasks and mapred.reduce.tasks

My understanding on the namenode and data nodes is that these should be set to less than the number of cores or less. Is that correct? For the client, should these be bumped closer to the total number of cores that are available in the overall cluster?

mapred.tasktracker.tasks.maximum

Does this work as a cap on mapred.map.tasks and mapred.reduce.tasks? Is it neccesary to use this as well as mapred.map.tasks and mapred.reduce.tasks?


Finally, in hadoop/conf/hadoop-env.sh

export HADOOP_HEAPSIZE=xxxx

Should this be changed normally? If so, how large should it normally be? 50% of total system memory?

Thanks for any input,

-stephen

--
Stephen Mulcahy, DI2, Digital Enterprise Research Institute,
NUI Galway, IDA Business Park, Lower Dangan, Galway, Ireland
http://di2.deri.ie    http://webstar.deri.ie    http://sindice.com

Reply via email to