Hello everyone.

I have experienced a very strange situation
about HDFS operation.

 

I have a 1 master and 10 slaves cluster
environment.

 

When I put a file A into HDFS with
dfs.replication=10, I can see every block of the file A is replicated in every
node.

So, it is reasonable to think that HDFS
file reader can operate as local block reader when I want to read that file A.

 

However, when I execute hdfs dfs –copyToLocal
A /to/my/localDir, the file reading time is same as the case of
dfs.replication=1.

 

So, I moniter the network resources
especially read and write data.

Both two cases that dfs.replication={1, 10}
fully exploit network resources.. 

This means reading that file does not
consider the block location..

 

Is it reasonable operation of HDFS?

Then, what is the true meaning of data
locality in HDFS? (We all know about the data locality of map task..)

 

I want to know the reason of the same performance
between both two “copyToLocal” cases.

 

Thanks!Yoonmin 


// Yoonmin Nam


Reply via email to