Eric Baldeschwieler wrote:
If we moved to a scheme where the name node was just given a small number of blocks with each heartbeat, there would be no reason to not start reporting blocks immediately, would there?
There would still be a small storm of un-needed replications on startup. Say it takes a minute at startup for all data nodes to report their complete block lists to the name node. If heartbeats are every 3 seconds, then all but the last data node to report in would be handed 20 small lists of blocks to start replicating. And the switches could be saturated doing a lot of un-needed transfers, which would slow startup. Then, for the next minute after startup, the nodes would be told to delete blocks that are now over-replicated. We'd like startup to be as fast and painless as possible. Waiting a bit before checking to see if blocks are over- or under-replicated seems a good way.
Doug
