Hi guys!

I'd like some help fine tuning my cluster. I currently have 20 boxes exactly
alike. Single core machines with 600MB of RAM. No chance of upgrading the
hardware.

My cluster is made out of 1 NameNode/JobTracker box and 19
DataNode/TaskTracker boxes.

All my config is default except i've set the following in my mapred-site.xml
in an effort to try and prevent choking my boxes.
  *<property>*
*      <name>mapred.tasktracker.map.tasks.maximum</name>*
*      <value>1</value>*
*  </property>*

I'm running a MapReduce job which reads a Proxy Server log file (2GB), maps
hosts to each record and then in the reduce task it accumulates the amount
of bytes received from each host.

Currently it's producing about 65000 keys

The hole job takes forever to complete, specially the reduce part. I've
tried different tuning configs by I can't bring it down under 20mins.

Any ideas?

Thanks for your help!
Pony

Reply via email to