[
https://issues.apache.org/jira/browse/CASSANDRA-193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12776519#action_12776519
]
Jun Rao commented on CASSANDRA-193:
-----------------------------------
First, thanks Stu for this big patch. This is a lot of work. Here are some
review comments.
1. The high level question. When should the Merkle tree be computed? The patch
piggybacks the computation on a regular compaction. Even if it's moved to major
compaction, it's still not enough. This is because there is an upper limit on
file size. Therefore, not all sstables are necessarily read during a major
compaction, which means the Merkle tree may not see all keys in a particular
key range.
One approach is to explicitly iterate through keys on all sstables in a
particular range, compute the Merkle tree, send the Merkle tree to replicas.
Each replica then computes its own Merkle tree and do the comparison. We can
trigger this process through a nodeprobe command.
I spent quite some time reading through the code and I am still confused in
several places. Perhaps some more text description on each main method (e.g,
split, validate, difference) will help.
2. It's not clear to me exactly how splitting in Merkle tree works.
2.1 In MerkleTree.Node.insert, why do you increment the depth of the left child
even when the node doesn't split?
2.2 In the same function, if the node does split, where is the code to shrink
the children list in the splitted node to half?
2.3 In the same function, do you have to keep calling invalidate during
insertion? It seems to me that it would be simpler if you first split the tree
to what you want, then make a pass of the tree to invalidate all nodes before
computing the hashes.
3. I am not exactly clear on how the validator works.
3.1 In Validator.add, there is comment about generating a new range. However,
no code does that.
3.2 In TreeRange.validateHelper, you are trying to compute the hash for a set
of rows in a range. Why do you have to compute multiple hash values recursively?
4. I need some text description to really follow the Differencer code.
5. The Hashable class is confusing. By its name, I expect it to be really about
just the hash. However, the comparator is actually on token. HashableToken is
probably a better name.
6. The repair logic is missing in Differencer.
> Proactive repair
> ----------------
>
> Key: CASSANDRA-193
> URL: https://issues.apache.org/jira/browse/CASSANDRA-193
> Project: Cassandra
> Issue Type: New Feature
> Components: Core
> Reporter: Jonathan Ellis
> Assignee: Stu Hood
> Fix For: 0.5
>
> Attachments: 193-1-tree-preparation.diff, 193-2-tree.diff,
> 193-3-aes-preparation.diff, 193-4-aes.diff
>
>
> Currently cassandra supports "read repair," i.e., lazy repair when a read is
> done. This is better than nothing but is not sufficient for some cases (e.g.
> catastrophic node failure where you need to rebuild all of a node's data on a
> new machine).
> Dynamo uses merkle trees here. This is harder for Cassandra given the CF
> data model but I suppose we could just hash the serialized CF value.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.