Hi, We've finally got our hadoop cluster up, some data to crunch and a map/reduce job.
After running a few configurations, i'm not sure about our performance and would like to get some advice.... We have a 20 node ec2 cluster. We have 750MB of data. currently our job seems to be doing 1%/min on the cluster. Using a much smaller subset of data and running locally, the job takes a matter of seconds. Here's our hadoop-site.xml <configuration> <property> <name>hadoop.tmp.dir</name> <value>/mnt/hadoop</value> </property> <property> <name>fs.default.name</name> <value>** domain omitted **</value> </property> <property> <name>mapred.job.tracker</name> <value>** domain omitted **</value> </property> <property> <name>mapred.map.tasks</name> <value>100</value> </property> <property> <name>mapred.reduce.tasks</name> <value>15</value> </property> <property> <name>mapred.tasktracker.tasks.maximum</name> <value>20</value> </property> </configuration> Everything else is just set with the defaults. Thanks, - Jonathan
