[jira] Commented: (CASSANDRA-193) Proactive repair

Jun Rao (JIRA) Wed, 11 Nov 2009 09:22:11 -0800

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12776519#action_12776519
 ]


Jun Rao commented on CASSANDRA-193:
-----------------------------------

First, thanks Stu for this big patch. This is a lot of work. Here are some 
review comments.
1. The high level question. When should the Merkle tree be computed? The patch 
piggybacks the computation on a regular compaction. Even if it's moved to major 
compaction, it's still not enough. This is because there is an upper limit on 
file size. Therefore, not all sstables are necessarily read during a major 
compaction, which means the Merkle tree may not see all keys in a particular 
key range.

One approach is to explicitly iterate through keys on all sstables in a 
particular range, compute the Merkle tree, send the Merkle tree to replicas. 
Each replica then computes its own Merkle tree and do the comparison. We can 
trigger this process through a nodeprobe command.

I spent quite some time reading through the code and I am still confused in 
several places. Perhaps some more text description on each main method (e.g, 
split, validate, difference) will help.

2. It's not clear to me exactly how splitting in Merkle tree works.
2.1 In MerkleTree.Node.insert, why do you increment the depth of the left child 
even when the node doesn't split?
2.2 In the same function, if the node does split, where is the code to shrink 
the children list in the splitted node to half?
2.3 In the same function, do you have to keep calling invalidate during 
insertion? It seems to me that it would be simpler if you first split the tree 
to what you want, then make a pass of the tree to invalidate all nodes before 
computing the hashes.

3. I am not exactly clear on how the validator works.
3.1 In Validator.add, there is comment about generating a new range. However, 
no code does that.
3.2 In TreeRange.validateHelper, you are trying to compute the hash for a set 
of rows in a range. Why do you have to compute multiple hash values recursively?

4. I need some text description to really follow the Differencer code.

5. The Hashable class is confusing. By its name, I expect it to be really about 
just the hash. However, the comparator is actually on token. HashableToken is 
probably a better name.

6. The repair logic is missing in Differencer.


> Proactive repair
> ----------------
>
>                 Key: CASSANDRA-193
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-193
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Stu Hood
>             Fix For: 0.5
>
>         Attachments: 193-1-tree-preparation.diff, 193-2-tree.diff, 
> 193-3-aes-preparation.diff, 193-4-aes.diff
>
>
> Currently cassandra supports "read repair," i.e., lazy repair when a read is 
> done.  This is better than nothing but is not sufficient for some cases (e.g. 
> catastrophic node failure where you need to rebuild all of a node's data on a 
> new machine).
> Dynamo uses merkle trees here.  This is harder for Cassandra given the CF 
> data model but I suppose we could just hash the serialized CF value.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (CASSANDRA-193) Proactive repair

Reply via email to