Hi guys! I'd like some help fine tuning my cluster. I currently have 20 boxes exactly alike. Single core machines with 600MB of RAM. No chance of upgrading the hardware.
My cluster is made out of 1 NameNode/JobTracker box and 19 DataNode/TaskTracker boxes. All my config is default except i've set the following in my mapred-site.xml in an effort to try and prevent choking my boxes. *<property>* * <name>mapred.tasktracker.map.tasks.maximum</name>* * <value>1</value>* * </property>* I'm running a MapReduce job which reads a Proxy Server log file (2GB), maps hosts to each record and then in the reduce task it accumulates the amount of bytes received from each host. Currently it's producing about 65000 keys The hole job takes forever to complete, specially the reduce part. I've tried different tuning configs by I can't bring it down under 20mins. Any ideas? Thanks for your help! Pony
