I checked with

bin/hadoop fs -stat "%n %r" input/*

part-00000 4
part-00001 4
part-00002 4
part-00003 4
part-00004 4
part-00005 4
part-00006 4
part-00007 4

and see replication factor is 4.

Also, I set replication factor to 4 in hadoop-site.xml, run stop-all.sh
and start-all.sh, re-load the data, and re-run the code but still
getting the same result.

I am searching for hadoop-default.xml and find 

<property>
<name>dfs.balance.bandwidthPerSec</name>
<value>1048576</value>
<description>
Specifies the maximum amount of bandwidth that each datanode
can utilize for the balancing purpose in term of
the number of bytes per second.
</description>
</property>

1048576 is 1 GB/s and seems like higher than 1 Gbit/s for my nodes. I am
going to change this value and see what happens. 

Any other suggestions?

Thank you very much,

-seunghwa

On Fri, 2009-07-17 at 16:07 -0700, Ted Dunning wrote:
> 
> Does [hadoop fs -fsck /] show any under-replicated files/blocks?  you
> may not waited long enough after increasing the target replication
> rate.
> 
> Another thing to watch out for in a production node is the
> distribution of node blocks.  You should be careful to load data from
> outside the cluster to ensure random placement of file blocks.  That
> is critical for getting good locality.  This obviously doesn't apply
> to your situation with 4 replicas on 4 nodes.
> 
> Todd's comment about -setrep is also very important to note.
> 
> On Fri, Jul 17, 2009 at 3:57 PM, Seunghwa Kang <[email protected]>
> wrote:
>         
>         Just for test purpose, I increase the replication factor to 4,
>         and check
>         that input data actually has replication factor of 4 with
>         'hadoop fs
>         -stat %r%n' but find that the ratio is still around 80% for 4
>         nodes.
>         
> 

Reply via email to