Hi,

Im mike,
I am a new user of Hadoop. currently, I have a cluster of 8 machines and a
file of size 2 gigs.
When I load it into hdfs using command
hadoop dfs -put /a.dat /data
It actually loads it on all data nodes. dfsadmin -report shows hdfs usage to
16 gigs. And it is taking 2 hours to load that data file.

with 1 node - my mapreduce operation on this data took 150 seconds.

So when I used my mapred operation on this cluster.. it is taking 220
seconds for same file.

Can some one please tell me How to distribute this file over 8 nodes - so
that each of them will have roughly 300 mbs of file chunk and the mapreduce
operation that I have wrote to work in parallel? Isn't hadoop cluster
supposed to be working in parallel?

best.

Reply via email to