I am sure re-replication is not done on every heartbeat miss since that would be very expensive and inefficient. At the same time you cannot really tell if a node is partitioned away, crashed or just slow. Is it threshold based i.e I missed N heartbeats so re-replicate ? Which package in the source code could I look at to glean this information ?
Thanks A On 7/17/07, Phantom <[EMAIL PROTECTED]> wrote:
That's awesome. Thanks A On 7/17/07, Doug Cutting <[EMAIL PROTECTED]> wrote: > > Phantom wrote: > > Here is the scenario I was concerned about. Consider three nodes in > the > > system A, B and C which are placed say in different racks. Let us say > that > > the disk on A fries up today. Now the blocks that were stored on A are > not > > going to re-replicated (this is my understanding but I could be wrong > in > > this assumption) to some other node or to the new disk with which you > would > > bring back A. > > That's incorrect. When a datanode fails to send a heartbeat to the > namenode in a timely manner then its data is assumed missing and is > re-replicated. And when block corruption is detected, corrupt replicas > are removed and non-corrupt replicas are re-replicated to maintain the > desired level of replication. > > Doug >
