[ 
https://issues.apache.org/jira/browse/CASSANDRA-193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stu Hood updated CASSANDRA-193:
-------------------------------

    Attachment: mktree-and-binary-tree.png

> 3.2 Why is Merkle tree a binary tree? I thought it's an n-ary tree since you
> can specify the order of each node.
This MerkleTree data structure is an n-ary tree that represents a perfect 
binary hash tree.

> Since a leaf node corresponds to a single range, why do you need a list of 
> hashes?
> I thought each leaf node maintains a single hash value for rows its range and 
> each
> inner node maintains a single hash that is the logical AND of the hashes of 
> all its
> children. Is this not correct?
That is not correct... I'm sorry, I mispoke above... I'll edit that comment and 
remove the bad information. It is the leaves of the perfect tree which contain 
sequentially hashed values, and the inner nodes of the perfect tree contain the 
logical AND hash of their children.

I'm attaching an example. This perfect binary tree has a max depth of 2, so the 
maximum depth of a MkT.Hash in the MkTree representing it is 2 as well. In this 
case though, the MkTree has only been split twice, so it contains a single 
MkT.Leaf and three MkT.Hashes. Each MkT.Hash represents a node from the perfect 
tree, but they are not all at the same depth in the tree. MkT.Nodes will not 
always correspond to a node from the binary tree (because the orders are not 
equal), but when they do, we can cache a value in that MkT.Node.

The important thing to notice, is that if the MkTree had been split at 8 and 12 
instead of 4 and 8, it would still represent the same perfect binary tree, 
which is why we can compare MkTrees generated by different Cassandra endpoints.

The MkTree.Hash for (8, 0] represents an inner node in the perfect tree, so in 
order to calculate it in the same manner no matter how a Cassandra endpoint 
chooses to split the MkTree, during validateHelper() we always generate a 
perfect tree down to the maximum hash depth, sequentially hash rows into the 
perfect leaves, and then calculate the logical AND back up to the range needed 
by validate().

> 2.1 Still confused. So node.depth doesn't mean the depth of the node in the 
> tree?
All depths in the code refer to the depth of the perfect binary tree that a 
Node or Hash represents. This is why MkT.Nodes can only occasionally contain 
hash values: it's fairly common for a N-ary tree to contain a node representing 
a range that isn't represented by a single node in a binary tree.

> 3.1 The ranges don't change after the iterator is generated, right? But 
> inside the
> while loop in Validator.add, there is comment about adding a new range.
The comment says "generate a new range": it is asking for 
TreeRangeIterator.next(), which hides the generation of the next range that 
needs to be hashed, and returns it.

----

I hope this helps! Thanks again for checking it out.

> Proactive repair
> ----------------
>
>                 Key: CASSANDRA-193
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-193
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Stu Hood
>             Fix For: 0.5
>
>         Attachments: 193-1-tree-preparation.diff, 193-2-tree.diff, 
> 193-3-aes-preparation.diff, 193-4-aes.diff, mktree-and-binary-tree.png
>
>
> Currently cassandra supports "read repair," i.e., lazy repair when a read is 
> done.  This is better than nothing but is not sufficient for some cases (e.g. 
> catastrophic node failure where you need to rebuild all of a node's data on a 
> new machine).
> Dynamo uses merkle trees here.  This is harder for Cassandra given the CF 
> data model but I suppose we could just hash the serialized CF value.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to