Hi Pony, There is a good chance that your boxes are doing some heavy swapping and that is a killer for Hadoop. Have you tried with mapred.job.reuse.jvm.num.tasks=-1 and limiting as much possible the heap on that boxes?
Cheers, Esteban. -- Get Hadoop! http://www.cloudera.com/downloads/ On Thu, Jul 7, 2011 at 1:29 PM, Juan P. <[email protected]> wrote: > Hi guys! > > I'd like some help fine tuning my cluster. I currently have 20 boxes > exactly > alike. Single core machines with 600MB of RAM. No chance of upgrading the > hardware. > > My cluster is made out of 1 NameNode/JobTracker box and 19 > DataNode/TaskTracker boxes. > > All my config is default except i've set the following in my > mapred-site.xml > in an effort to try and prevent choking my boxes. > *<property>* > * <name>mapred.tasktracker.map.tasks.maximum</name>* > * <value>1</value>* > * </property>* > > I'm running a MapReduce job which reads a Proxy Server log file (2GB), maps > hosts to each record and then in the reduce task it accumulates the amount > of bytes received from each host. > > Currently it's producing about 65000 keys > > The hole job takes forever to complete, specially the reduce part. I've > tried different tuning configs by I can't bring it down under 20mins. > > Any ideas? > > Thanks for your help! > Pony >
