Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Cassandra Wiki" for 
change notification.

The "AntiEntropy" page has been changed by StuHood.
http://wiki.apache.org/cassandra/AntiEntropy?action=diff&rev1=2&rev2=3

--------------------------------------------------

  Cassandra's implementation is modeled on Dynamo's, with 
[[ArchitectureAntiEntropy|modifications to support the richer data model]].  
Quoting from 
[[http://www.allthingsdistributed.com/2007/10/amazons_dynamo.html|Amazon's 
Dynamo]] section 4.7,
  
-  . To detect the inconsistencies between replicas faster and to minimize the 
amount of transferred data, Dynamo uses Merkle trees [13]. A Merkle tree is a 
hash tree where leaves are hashes of the values of  individual keys. Parent 
nodes higher in the tree are hashes of their respective  children. The 
principal advantage of Merkle tree is that each branch of the tree  can be 
checked independently without requiring nodes to download the entire  tree or 
the entire data set. Moreover, Merkle trees help in reducing the amount of  
data that needs to be transferred while checking for inconsistencies among  
replicas. For instance, if the hash values of the root of two trees are equal,  
then the values of the leaf nodes in the tree are equal and the nodes require 
no synchronization. If not, it implies that the values of some replicas are 
different. In such cases, the nodes may exchange the hash values of  children 
and the process continues until it reaches the leaves of the trees, at which  
point the hosts can identify the keys that are “out of sync”. Merkle trees  
minimize the amount of data that needs to be transferred for synchronization 
and  reduce the number of disk reads performed during the anti-entropy process.
+  . To detect the inconsistencies between replicas faster and to minimize the 
amount of transferred data, Dynamo uses Merkle trees [13]. A Merkle tree is a 
hash tree where leaves are hashes of the values of  individual keys. Parent 
nodes higher in the tree are hashes of their respective  children. The 
principal advantage of Merkle tree is that each branch of the tree  can be 
checked independently without requiring nodes to download the entire [...] data 
set.
  
-  . Dynamo uses Merkle trees for anti-entropy as follows: Each node maintains 
a separate Merkle tree for each key range (the set of keys  covered by a 
virtual node) it hosts. This allows nodes to compare whether the keys  within a 
key range are up-to-date. In this scheme, two nodes exchange the root  of the 
Merkle tree corresponding to the key ranges that they host in common.  
Subsequently, using the tree traversal scheme described above the nodes 
determine if  they have any differences and perform the appropriate 
synchronization action.
+ The key difference in Cassandra's implementation of anti-entropy is that the 
Merkle trees are built per column family, and they are not maintained for 
longer than it takes to send them to neighboring nodes. Instead, the trees are 
generated as snapshots of the dataset during major compactions: this means that 
excess data might be sent across the network, but it saves local disk IO, and 
is preferable for very large datasets.
  

Reply via email to