Hi, Im mike, I am a new user of Hadoop. currently, I have a cluster of 8 machines and a file of size 2 gigs. When I load it into hdfs using command hadoop dfs -put /a.dat /data It actually loads it on all data nodes. dfsadmin -report shows hdfs usage to 16 gigs. And it is taking 2 hours to load that data file.
with 1 node - my mapreduce operation on this data took 150 seconds. So when I used my mapred operation on this cluster.. it is taking 220 seconds for same file. Can some one please tell me How to distribute this file over 8 nodes - so that each of them will have roughly 300 mbs of file chunk and the mapreduce operation that I have wrote to work in parallel? Isn't hadoop cluster supposed to be working in parallel? best.
