Hello,

I am running Hadoop on my 4 nodes system.

Initially, I pick the replication factor of 2, and nearly 100% of map
tasks run in local up to 3 nodes, but the ratio drops to 80% if I use
all 4 nodes.

As my nodes have quite high I/O bandwidth (24 disks per node), but only
limited network bandwidth (1 GigE), this really hampers the scalability.

Just for test purpose, I increase the replication factor to 4, and check
that input data actually has replication factor of 4 with 'hadoop fs
-stat %r%n' but find that the ratio is still around 80% for 4 nodes. 

How this can happen, and which configuration parameter should I change
to fix this?

Thank you very much,

-seunghwa

Reply via email to