Boyu Zhang wrote:
Dear Steve,

Thank you for your reply. I did worried about my email got lost, but I will
wait for an answer longer next time, thank you for reminding me : )

I understand that if you have data replica = 3, the namenode will assign the
blocks that way. However, I still have a question, if the data replica = 1,
I just use it for testing to see how HDFS works, what is the policy to
decide which datanode gets which block? Thank you so much!

If you are running your code on a datanode, it will be on the machine you are running on (to save bandwidth). Otherwise, another machine will somehow be picked (I forget where and how). Hadoop tries to keep the data balanced across machines, to stop one having all the data, others having less. I don't know whether it goes on percentage of disk space free or total amount of data. You'd have to rummage in the source to work out.

Like I said, there's been discussion on improving the layout algorithms, to support plugins with different policies.

Reply via email to