I checked with bin/hadoop fs -stat "%n %r" input/*
part-00000 4 part-00001 4 part-00002 4 part-00003 4 part-00004 4 part-00005 4 part-00006 4 part-00007 4 and see replication factor is 4. Also, I set replication factor to 4 in hadoop-site.xml, run stop-all.sh and start-all.sh, re-load the data, and re-run the code but still getting the same result. I am searching for hadoop-default.xml and find <property> <name>dfs.balance.bandwidthPerSec</name> <value>1048576</value> <description> Specifies the maximum amount of bandwidth that each datanode can utilize for the balancing purpose in term of the number of bytes per second. </description> </property> 1048576 is 1 GB/s and seems like higher than 1 Gbit/s for my nodes. I am going to change this value and see what happens. Any other suggestions? Thank you very much, -seunghwa On Fri, 2009-07-17 at 16:07 -0700, Ted Dunning wrote: > > Does [hadoop fs -fsck /] show any under-replicated files/blocks? you > may not waited long enough after increasing the target replication > rate. > > Another thing to watch out for in a production node is the > distribution of node blocks. You should be careful to load data from > outside the cluster to ensure random placement of file blocks. That > is critical for getting good locality. This obviously doesn't apply > to your situation with 4 replicas on 4 nodes. > > Todd's comment about -setrep is also very important to note. > > On Fri, Jul 17, 2009 at 3:57 PM, Seunghwa Kang <[email protected]> > wrote: > > Just for test purpose, I increase the replication factor to 4, > and check > that input data actually has replication factor of 4 with > 'hadoop fs > -stat %r%n' but find that the ratio is still around 80% for 4 > nodes. > >
