Re: HDFS load(traffic) balancing

Raghu Angadi Tue, 17 Feb 2009 16:55:25 -0800

Sangmin Lee wrote:

Hi folks,


I have a question regarding hdfs' load balancing when it chooses target
datanodes for a block.
From the code, it seems it make a decision based on the information from
previously heartbeats.
Since heartbeats come every 3 seconds, within that window we may end up
putting more load on some datanodes than others.
I noticed that for disk space balancing, namenode maintains scheduled block
information for each datanode which is updated whenever new block is
assigned to the datanodes.
Shouldn't we do a similar thing for traffic??

we should. HADOOP-3707 was meant for a dot release and thus didn't wantto depend on the new stat too much that time. The comments in jira andin the code mention so.

Unless you have a large heartbeat, do you really think it makes a muchdifference in normal case? We would like to know if you saw any such cases.

It could help if there are large number of clients simultaneouslywriting from small set of nodes.

Based on discussions here at Yahoo.. this area of NN scheduling willundergo some improvements in near future especially to handle nodes withheterogeneous datanodes.


Raghu.

Thanks,
Sangmin Lee

Re: HDFS load(traffic) balancing

Reply via email to