Mike,

I'm not really sure I have seen a community consensus around how to handle 
hyper-threading within Hadoop (although I have seen quite a few articles that 
discuss it). I was assuming that when Juan mentioned they were 4-core boxes 
that he meant 4 physical cores and not HT cores. I was more stating that the 
starting point should be 1 slot per thread (or hyper-threaded core) but 
obviously reviewing the results from ganglia, or any other monitoring solution, 
will help you come up with a more concrete configuration based on the load.

My brain might not be working this morning but how did you get the 10 slots 
again? That seems low for an 8 physical core box but somewhat overextending for 
a 4 physical core box.

Matt

-----Original Message-----
From: [email protected] [mailto:[email protected]] On Behalf Of Michel 
Segel
Sent: Tuesday, June 28, 2011 7:39 AM
To: [email protected]
Subject: Re: Performance Tunning

Matt,
You have 2 threads per core, so your Linux box thinks an 8 core box has16 
cores. In my calcs, I tend to take a whole core for TT DN and RS and then a 
thread per slot so you end up w 10 slots per node. Of course memory is also a 
factor.

Note this is only a starting point.you can always tune up. 

Sent from a remote device. Please excuse any typos...

Mike Segel

On Jun 27, 2011, at 11:11 PM, "GOEKE, MATTHEW (AG/1000)" 
<[email protected]> wrote:

> Per node: 4 cores * 2 processes = 8 slots
> Datanode: 1 slot
> Tasktracker: 1 slot
> 
> Therefore max of 6 slots between mappers and reducers.
> 
> Below is part of our mapred-site.xml. The thing to keep in mind is the number 
> of maps is defined by the number of input splits (which is defined by your 
> data) so you only need to worry about setting the maximum number of 
> concurrent processes per node. In this case the property you want to hone in 
> on is mapred.tasktracker.map.tasks.maximum and 
> mapred.tasktracker.reduce.tasks.maximum. Keep in mind there are a LOT of 
> other tuning improvements that can be made but it requires an strong 
> understanding of your job load.
> 
> <configuration>
>  <property>
>    <name>mapred.tasktracker.map.tasks.maximum</name>
>    <value>2</value>
>  </property>
> 
>  <property>
>    <name>mapred.tasktracker.reduce.tasks.maximum</name>
>    <value>1</value>
>  </property>
> 
>  <property>
>    <name>mapred.child.java.opts</name>
>    <value>-Xmx512m</value>
>  </property>
> 
>  <property>
>    <name>mapred.compress.map.output</name>
>    <value>true</value>
>  </property>
> 
>  <property>
>    <name>mapred.output.compress</name>
>    <value>true</value>
>  </property>
> 
> 
This e-mail message may contain privileged and/or confidential information, and 
is intended to be received only by persons entitled
to receive such information. If you have received this e-mail in error, please 
notify the sender immediately. Please delete it and
all attachments from any servers, hard drives or any other media. Other use of 
this e-mail by you is strictly prohibited.

All e-mails and attachments sent and received are subject to monitoring, 
reading and archival by Monsanto, including its
subsidiaries. The recipient of this e-mail is solely responsible for checking 
for the presence of "Viruses" or other "Malware".
Monsanto, along with its subsidiaries, accepts no liability for any damage 
caused by any such code transmitted by or accompanying
this e-mail or any attachment.


The information contained in this email may be subject to the export control 
laws and regulations of the United States, potentially
including but not limited to the Export Administration Regulations (EAR) and 
sanctions regulations issued by the U.S. Department of
Treasury, Office of Foreign Asset Controls (OFAC).  As a recipient of this 
information you are obligated to comply with all
applicable U.S. export laws and regulations.

Reply via email to