Matt, You have 2 threads per core, so your Linux box thinks an 8 core box has16 cores. In my calcs, I tend to take a whole core for TT DN and RS and then a thread per slot so you end up w 10 slots per node. Of course memory is also a factor.
Note this is only a starting point.you can always tune up. Sent from a remote device. Please excuse any typos... Mike Segel On Jun 27, 2011, at 11:11 PM, "GOEKE, MATTHEW (AG/1000)" <[email protected]> wrote: > Per node: 4 cores * 2 processes = 8 slots > Datanode: 1 slot > Tasktracker: 1 slot > > Therefore max of 6 slots between mappers and reducers. > > Below is part of our mapred-site.xml. The thing to keep in mind is the number > of maps is defined by the number of input splits (which is defined by your > data) so you only need to worry about setting the maximum number of > concurrent processes per node. In this case the property you want to hone in > on is mapred.tasktracker.map.tasks.maximum and > mapred.tasktracker.reduce.tasks.maximum. Keep in mind there are a LOT of > other tuning improvements that can be made but it requires an strong > understanding of your job load. > > <configuration> > <property> > <name>mapred.tasktracker.map.tasks.maximum</name> > <value>2</value> > </property> > > <property> > <name>mapred.tasktracker.reduce.tasks.maximum</name> > <value>1</value> > </property> > > <property> > <name>mapred.child.java.opts</name> > <value>-Xmx512m</value> > </property> > > <property> > <name>mapred.compress.map.output</name> > <value>true</value> > </property> > > <property> > <name>mapred.output.compress</name> > <value>true</value> > </property> > >
