hi, all I have a hadoop cluster which have one master and three datanodes. I want to put a local file about 128M intpu hdfs, I have set the block-size to 10M
when I set the replication to 0, I found that all the data distributed to the node which I execute the command 'bin/hadoop dfs -put file.gz input', so this node's disk space is used about 128M, but other nodes has no disk space used. when I set the replication to 3, I found that every nodes have the same data, so every nodes is about 128M disk space used. what should I do? I'm using hadoop-0.15.2. any one can help me? thanks.
