[
https://issues.apache.org/jira/browse/CASSANDRA-193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12723151#action_12723151
]
Stu Hood commented on CASSANDRA-193:
------------------------------------
> The more I think about this the less convinced I am that the
> partially-invalidated live tree is going to be worth the overhead of
> maintaining it (and initializing it on startup).
There is no need to initialize the tree on startup: it can be done lazily when
the first tree
exchange requests come in.
> If you instead just create a mini-merkle tree from the first N keys and
> exchange that with the replica nodes, then repeat for the next N, you still
> get a big win on network traffic (which is the main concern here)...
Yes, network traffic is important, but the whole point of maintaining the tree
in memory is that it prevents us from having to read entire SSTables from disk
in order to do repairs (similar to BloomFilters for random lookups). Any
portions of the tree that survive (which should be large portions, assuming we
do invalidations correctly) mean that we can use the SSTable index to seek()
past chunks of the file.
> but you have no startup overhead, no complicated extra maintenance to perform
> on insert, better performance in the worst case and (probably) in the average
> case, since you are avoiding random reads in favor of (a potentially greater
> number of) streaming reads...
* No startup overhead necessary,
* B+Tree invalidations will only involve marking a leaf node invalid: aka, do
a lookup and increment a counter,
* There won't be any random reads... I'm not sure where you read that: in
order to validate regions of the tree we will be iterating over the keys in the
CF in sorted order, skipping regions that are valid.
> Proactive repair
> ----------------
>
> Key: CASSANDRA-193
> URL: https://issues.apache.org/jira/browse/CASSANDRA-193
> Project: Cassandra
> Issue Type: New Feature
> Reporter: Jonathan Ellis
> Assignee: Stu Hood
> Fix For: 0.5
>
>
> Currently cassandra supports "read repair," i.e., lazy repair when a read is
> done. This is better than nothing but is not sufficient for some cases (e.g.
> catastrophic node failure where you need to rebuild all of a node's data on a
> new machine).
> Dynamo uses merkle trees here. This is harder for Cassandra given the CF
> data model but I suppose we could just hash the serialized CF value.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.