[jira] [Updated] (CASSANDRA-9876) One way targeted repair

Geoffrey Yu (JIRA) Tue, 09 Aug 2016 18:49:51 -0700

     [ 
https://issues.apache.org/jira/browse/CASSANDRA-9876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Geoffrey Yu updated CASSANDRA-9876:
-----------------------------------
    Attachment: 9876-dtest-master.txt
                9876-trunk-v2.txt

Thanks for the quick review! I’ve attached a new patch that addresses your 
comments, with the exception of one of them for which I wanted to get some more 
feedback first.

I also attached a patch that adds one dtest to test the pull repair. It works 
nearly identically to the token range repair with the exception that it asserts 
that one of the nodes only sends data and the other only receives.

{quote}
I don't think it's necessary to make specifying --start-token and --end-token 
mandatory, since if that is not specified it will just pull repair all common 
ranges between specified hosts.
{quote}

The reason why I added in the check for a token range was that the repair code 
as it is now doesn’t actually add only the common ranges between the specified 
hosts. I wasn’t sure if this is was the intended behavior or a bug.

To replicate the issue, just create a 3 node cluster, add a keyspace with 
replication factor 2, and run a regular repair through nodetool on that 
keyspace with exactly two nodes specified.

The reason it happens is that if no ranges are specified, the repair will [add 
all ranges on the local 
node|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/service/StorageService.java#L3137].
 Then when we hit {{RepairRunnable}}, we try to find a list of neighbors for 
each range 
(https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/repair/RepairRunnable.java#L160-L162).

The problem here is that it isn’t always true that every range the local node 
owns is also owned by the remote node we specified through the nodetool 
command. In the example above, only one range will be common between any two 
nodes. Because of this the [check 
here|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/service/ActiveRepairService.java#L246-L251]
 may result in an exception being thrown, which aborts the repair.

If this is intended behavior, then forcing the user to specify a token range 
that is common between the nodes prevents that exception from being thrown. 
Otherwise the error message, “Repair requires at least two endpoints that are 
neighbours before it can continue” can be confusing to the operator since the 
two specified nodes may actually share a common range. What do you think?

> One way targeted repair
> -----------------------
>
>                 Key: CASSANDRA-9876
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9876
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: sankalp kohli
>            Assignee: Geoffrey Yu
>            Priority: Minor
>             Fix For: 3.x
>
>         Attachments: 9876-dtest-master.txt, 9876-trunk-v2.txt, 9876-trunk.txt
>
>
> Many applications use C* by writing to one local DC. The other DC is used 
> when the local DC is unavailable. When the local DC becomes available, we 
> want to run a targeted repair b/w one endpoint from each DC to minimize the 
> data transfer over WAN.  In this case, it will be helpful to do a one way 
> repair in which data will only be streamed from other DC to local DC instead 
> of streaming the data both ways. This will further minimize the traffic over 
> WAN. This feature should only be supported if a targeted repair is run 
> involving 2 hosts.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (CASSANDRA-9876) One way targeted repair

Reply via email to