Thanks Aaron, I changed the settings in hadoop-site.xml file on all the machines. BTW, some settings are only reflected on the job level when I change the hadoop-default file, not sure why hadoop-site is being ignored (ex: mapred.tasktracker.map.tasks.maximum).
The files I am trying load are fairly small (~4MB on average). The configuration of each machine is: 2 dual cores (Xeon, 2.33Ghz), 8GB ram and a local SCSI hard drive. (total of 6 nodes) I will look into the article you mentioned, I understand that to load the files is going to be slow, was just wondering why the machines are not being utilized and mostly idle when more maps can be run in parallel. Maps running is always 6. Another option is to load one 20GB file but currently the speed is fairly slow in my opinion: 1GB in 1.5min. What kind of tuning can be done to speedup the load into hdfs? If you have any recommendation for specific parameters that might help it will be great. Thanks, Zeev
