Hi,

We've finally got our hadoop cluster up, some data to crunch and a
map/reduce job.

After running a few configurations, i'm not sure about our performance
and would like to get some advice....

We have a 20 node ec2 cluster.
We have 750MB of data.
currently our job seems to be doing 1%/min on the cluster.
Using a much smaller subset of data and running locally, the job takes a
matter of seconds.


Here's our hadoop-site.xml

<configuration>

<property>
  <name>hadoop.tmp.dir</name>
  <value>/mnt/hadoop</value>
</property>

<property>
  <name>fs.default.name</name>
  <value>** domain omitted **</value>
</property>

<property>
  <name>mapred.job.tracker</name>
  <value>** domain omitted **</value>
</property>

<property>
  <name>mapred.map.tasks</name>
  <value>100</value>
</property>

<property>
  <name>mapred.reduce.tasks</name>
  <value>15</value>
</property>

<property>
  <name>mapred.tasktracker.tasks.maximum</name>
  <value>20</value>
</property>

</configuration>

Everything else is just set with the defaults.

Thanks,

- Jonathan

Reply via email to