The reason I ask is because I know in S3 and in P2P storage systems that I have been involved in we had a replica synchronization algorithm that would run once every night and it relied on techniques like Merkle tree comparisons. Anyway understanding that would be beneficial. I don't mind reading through the sources but would appreciate if pointed to the correct package.
Thanks A On 7/17/07, Phantom <[EMAIL PROTECTED]> wrote:
I am sure re-replication is not done on every heartbeat miss since that would be very expensive and inefficient. At the same time you cannot really tell if a node is partitioned away, crashed or just slow. Is it threshold based i.e I missed N heartbeats so re-replicate ? Which package in the source code could I look at to glean this information ? Thanks A On 7/17/07, Phantom <[EMAIL PROTECTED]> wrote: > > That's awesome. > > Thanks > A > > On 7/17/07, Doug Cutting < [EMAIL PROTECTED]> wrote: > > > > Phantom wrote: > > > Here is the scenario I was concerned about. Consider three nodes in > > the > > > system A, B and C which are placed say in different racks. Let us > > say that > > > the disk on A fries up today. Now the blocks that were stored on A > > are not > > > going to re-replicated (this is my understanding but I could be > > wrong in > > > this assumption) to some other node or to the new disk with which > > you would > > > bring back A. > > > > That's incorrect. When a datanode fails to send a heartbeat to the > > namenode in a timely manner then its data is assumed missing and is > > re-replicated. And when block corruption is detected, corrupt > > replicas > > are removed and non-corrupt replicas are re-replicated to maintain the > > > > desired level of replication. > > > > Doug > > > >
