Boyu Zhang wrote:
Dear Steve,
Thank you for your reply. I did worried about my email got lost, but I will
wait for an answer longer next time, thank you for reminding me : )
I understand that if you have data replica = 3, the namenode will assign the
blocks that way. However, I still have a question, if the data replica = 1,
I just use it for testing to see how HDFS works, what is the policy to
decide which datanode gets which block? Thank you so much!
If you are running your code on a datanode, it will be on the machine
you are running on (to save bandwidth). Otherwise, another machine will
somehow be picked (I forget where and how). Hadoop tries to keep the
data balanced across machines, to stop one having all the data, others
having less. I don't know whether it goes on percentage of disk space
free or total amount of data. You'd have to rummage in the source to
work out.
Like I said, there's been discussion on improving the layout algorithms,
to support plugins with different policies.