[ 
https://issues.apache.org/jira/browse/CASSANDRA-3912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13236448#comment-13236448
 ] 

Sylvain Lebresne commented on CASSANDRA-3912:
---------------------------------------------

Note that allowing the repair of any range can actually be problematic, what we 
should do is allow only repair of a range that is fully contained in a replica 
range.
Consider 2 nodes A and B, where A is replica  for range (0, 100] and (100, 200] 
and B for range (100, 200] and (200, 300]. And say we ask to repair the range 
(50, 150]. The problem is that the merkle tree for that repair may end up 
hashing on the range (95, 105], but on that range, A and B cannot have the same 
hash (because A will have data on (95, 100] but B won't). It's not a huge deal, 
but there is no reason to allow user to shoot themselves in the foot.
I see 2 options:
* we simply throw an IllegalArgumentException if the user provide a range that 
is not fully contained in a replica range.
* we accept such range, but split it and trigger multiple repairs
My personal preference goes to the first option.

bq. Is it worth evaluating using the range scan code to compute the trees 
instead of an sstable-only scanner?

The problem is that for repair to not be overly inefficient, we need to make 
sure we compare "snapshots" of the data taken at "the same time". The flushing 
ensures that (It's obviously not exactly "at the same time", but it's as good 
as we can get it). I suppose you could say that if you repair very small ranges 
at a time, that problem would be minimized, and maybe that is true in practice, 
but that still feel fairly fragile to me (and would require very careful 
testing to make sure this does is acceptable in practice).
                
> support incremental repair controlled by external agent
> -------------------------------------------------------
>
>                 Key: CASSANDRA-3912
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3912
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Peter Schuller
>            Assignee: Peter Schuller
>             Fix For: 1.2
>
>         Attachments: CASSANDRA-3912-trunk-v1.txt, 
> CASSANDRA-3912-v2-001-add-nodetool-commands.txt, 
> CASSANDRA-3912-v2-002-fix-antientropyservice.txt
>
>
> As a poor man's pre-cursor to CASSANDRA-2699, exposing the ability to repair 
> small parts of a range is extremely useful because it allows (with external 
> scripting logic) to slowly repair a node's content over time. Other than 
> avoiding the bulkyness of complete repairs, it means that you can safely do 
> repairs even if you absolutely cannot afford e.g. disk spaces spikes (see 
> CASSANDRA-2699 for what the issues are).
> Attaching a patch that exposes a "repairincremental" command to nodetool, 
> where you specify a step and the number of total steps. Incrementally 
> performing a repair in 100 steps, for example, would be done by:
> {code}
> nodetool repairincremental 0 100
> nodetool repairincremental 1 100
> ...
> nodetool repairincremental 99 100
> {code}
> An external script can be used to keep track of what has been repaired and 
> when. This should allow (1) allow incremental repair to happen now/soon, and 
> (2) allow experimentation and evaluation for an implementation of 
> CASSANDRA-2699 which I still think is a good idea. This patch does nothing to 
> help the average deployment, but at least makes incremental repair possible 
> given sufficient effort spent on external scripting.
> The big "no-no" about the patch is that it is entirely specific to 
> RandomPartitioner and BigIntegerToken. If someone can suggest a way to 
> implement this command generically using the Range/Token abstractions, I'd be 
> happy to hear suggestions.
> An alternative would be to provide a nodetool command that allows you to 
> simply specify the specific token ranges on the command line. It makes using 
> it a bit more difficult, but would mean that it works for any partitioner and 
> token type.
> Unless someone can suggest a better way to do this, I think I'll provide a 
> patch that does this. I'm still leaning towards supporting the simple "step N 
> out of M" form though.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to