If we moved to a scheme where the name node was just given a small number of blocks with each heartbeat, there would be no reason to not start reporting blocks immediately, would there? Or the name node to respond to the heartbeat with the block range it wanted next heartbeat...

On Apr 3, 2006, at 2:42 PM, Doug Cutting wrote:

Hairong Kuang wrote:
Currently dfs datanodes send heartbeats and getBlockwork requests to the namenode at the same frequency (once every 3 seconds) after certain startup time. Is there any design reason that we need two seperate messages instead of one? I am thinking that if we let a sendHeartbeat request return the blocks to be deleted or replicated, we are able to cut the network traffic
in dfs.

No, that sounds like a reasonable change to me.

The startup delay will be need to be somehow re-implemented. Perhaps we could simply change this to a timer in the namenode on startup, so that it waits a while on startup before giving any blockwork. We might then have issues if, e.g, the namenode's ethernet cable were yanked for a few minutes. When it is re- connected, the namenode will start issuing lots of uneeded replication requests. Having a delay in blockwork at the datanode each time it establishes a new connection to the namenode solves that problem. Are there other cases that the current startup blockwork delay is handling?

Doug

Reply via email to