Does [hadoop fs -fsck /] show any under-replicated files/blocks? you may not waited long enough after increasing the target replication rate.
Another thing to watch out for in a production node is the distribution of node blocks. You should be careful to load data from outside the cluster to ensure random placement of file blocks. That is critical for getting good locality. This obviously doesn't apply to your situation with 4 replicas on 4 nodes. Todd's comment about -setrep is also very important to note. On Fri, Jul 17, 2009 at 3:57 PM, Seunghwa Kang <[email protected]> wrote: > > Just for test purpose, I increase the replication factor to 4, and check > that input data actually has replication factor of 4 with 'hadoop fs > -stat %r%n' but find that the ratio is still around 80% for 4 nodes. > >
